1
1
Граф коммитов

4493 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
f259d50ed7 Fully fix the PMI2 warning - turned out to be larger than originally thought due to the way the function was being handled across multiple files. Properly resolve the problem by not compiling the file if PMI2 is not desired, and then appropriately setting the visibility of the function within the module
Refs trac:4400

This commit was SVN r31084.

The following Trac tickets were found above:
  Ticket 4400 --> https://svn.open-mpi.org/trac/ompi/ticket/4400
2014-03-17 17:36:37 +00:00
Ralph Castain
e152449be4 Silence warning
cmr=v1.7.5:reviewer=ompi-gk1.7

This commit was SVN r31083.
2014-03-17 17:05:24 +00:00
Ralph Castain
b248b27637 Remove a check that prevented mpirun from exiting when it should in the single-node case
Refs trac:4393

This commit was SVN r31080.

The following Trac tickets were found above:
  Ticket 4393 --> https://svn.open-mpi.org/trac/ompi/ticket/4393
2014-03-15 15:25:44 +00:00
Ralph Castain
fbc5e3b773 Deal with the corner case where we encounter an error when attempting to launch a daemon. In this case, we will order abnormal termination before daemons callback to us, and thus any attempt to send them a "die" message will fail. Ensure that mpirun at least exits cleanly in this scenario, thereby allowing the remote daemons that did get launched to commit suicide when comm fails.
cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r31068.
2014-03-14 15:32:30 +00:00
Ralph Castain
2abed09d7c Continue to resolve priority issues. Cleanup the case of forced termination in mpirun during launch processing by ensuring we can respond to socket closures, and ensuring that the remote daemons correctly close their sockets when terminating.
Jeff: please test a variety of conditions to ensure we get this right

cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r31058.
2014-03-13 04:02:24 +00:00
Ralph Castain
ac421c931d The random number generator changes were incomplete (typo errors) in some places, and is missing the required declspec's for visibility.
cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r31053.
2014-03-12 22:37:27 +00:00
Ralph Castain
f56f37d364 Shifting to an event-driven RTE raises some interesting issues during shutdown. We want the last messages to get thru, but also need to correctly shutdown the virtual machine. This requires a delicate balancing act across event priorities, and the need to check for termination conditions in places where related events get processed.
Change the priority of comm_failure and job_termination events to ensure we process final messages prior to terminating. Check for termination conditions when processing proc termination events as we may order proc termination when the daemon gets an exit command, but we can't see the proc actually terminate until we get out of that message event.

Jeff: probably easiest to review this by testing. I tested it under both Slurm and rsh on v1.7.5 as well as trunk

cmr=v1.7.5:reviewer=jsquyres:subject=resolve event priorities during VM shutdown

This commit was SVN r31042.
2014-03-12 16:49:58 +00:00
Ralph Castain
a254d2db34 Silence warning when CR is not enabled
This commit was SVN r31025.
2014-03-12 13:47:03 +00:00
Adrian Reber
4512b3375e OOB/TCP: wire up the existing ft_event() function
This commit was SVN r31022.
2014-03-12 12:47:20 +00:00
Adrian Reber
34625b360b use the newly created JOB_STATE_FT_* events
This commit was SVN r31021.
2014-03-12 12:37:14 +00:00
Adrian Reber
8d40cd53ae use the existing pretty-print function for information about the job state
This commit was SVN r31020.
2014-03-12 12:34:25 +00:00
Ralph Castain
7869402f5f Sigh - looks like I did too good a job of turning things off. Back some of it out in favor of trying again when more time is available
Refs trac:4368

This commit was SVN r31017.

The following Trac tickets were found above:
  Ticket 4368 --> https://svn.open-mpi.org/trac/ompi/ticket/4368
2014-03-12 02:10:35 +00:00
Ralph Castain
dc28015bcb Something funny is going on when --without-orte, so revert the orte/Makefile.am for now while we try to figure it out
Refs trac:4368

This commit was SVN r31011.

The following Trac tickets were found above:
  Ticket 4368 --> https://svn.open-mpi.org/trac/ompi/ticket/4368
2014-03-11 23:07:21 +00:00
Ralph Castain
9c66c4f439 Correctly implement --disable-oshmem and --without-orte so we don't build the disabled section of code. Fix a bunch of code rot in the PMI rte component, and add several missing headers when building --without-orte.
NOTE: I transferred the oshmem-disabled-by-default from the 1.7 branch to the trunk to minimize future disruption if/when we change that option.

cmr=v1.8:reviewer=jsquyres

This commit was SVN r31006.
2014-03-11 22:02:40 +00:00
Adrian Reber
49173ccd61 add debug output for the ft_event handler
This commit was SVN r30990.
2014-03-11 15:39:16 +00:00
Adrian Reber
7304b700e1 Fix the newly added FT event state when compiling --with-ft
This commit was SVN r30988.
2014-03-11 13:20:08 +00:00
Ralph Castain
8e080fb95e Need a slightly different header
This commit was SVN r30986.
2014-03-11 03:03:12 +00:00
Ralph Castain
2cd1cfc7fe Remove this ignore for now
This commit was SVN r30985.
2014-03-11 03:02:13 +00:00
Ralph Castain
103a5c6df1 Output the bindings if ess verbosity is high enough
Refs trac:4356

This commit was SVN r30982.

The following Trac tickets were found above:
  Ticket 4356 --> https://svn.open-mpi.org/trac/ompi/ticket/4356
2014-03-11 01:21:14 +00:00
Ralph Castain
176b326c27 Add a comment to make Jeff happier...
Refs trac:4340

This commit was SVN r30980.

The following Trac tickets were found above:
  Ticket 4340 --> https://svn.open-mpi.org/trac/ompi/ticket/4340
2014-03-10 23:02:04 +00:00
Ralph Castain
081669b440 When pretty-printing binding info, we need to pass the topology down to the routine as the mapper isn't always working with the local topology - otherwise, we get an erroneous help message. Thanks to Tetsuya Mishima for reporting it
cmr=v1.7.5:reviewer=rhc:subject=fix pretty-print of bindings

This commit was SVN r30968.
2014-03-10 15:53:07 +00:00
Adrian Reber
b51733c456 fix "warning: 'sstore_stage_select' defined but not used"
In the function sstore_stage_select() the local variables
were set up and defined. Unfortunately this function was
never called. This patch moves variable set up to the
sstore_stage_register() function and checks the return
values of the variable initialization.

This commit was SVN r30958.
2014-03-06 16:53:27 +00:00
Ralph Castain
7a44af375c Add an FT event state and set the state machine to callback to the OOB base ft event when activated
This commit was SVN r30950.
2014-03-06 02:44:29 +00:00
Ralph Castain
9793909988 Correct the constant we check for an error. Thanks to George for noticing it.
cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r30949.
2014-03-06 02:21:27 +00:00
Ralph Castain
fc2dd6ac48 Per Jeff's request, add a more detailed comment as to why we are turning off the warning at this time.
Refs trac:4339

This commit was SVN r30948.

The following Trac tickets were found above:
  Ticket 4339 --> https://svn.open-mpi.org/trac/ompi/ticket/4339
2014-03-06 02:17:25 +00:00
Ralph Castain
c9465d97b4 Resolve a race condition when responding to a SIGTERM to ensure that any final message from the application is correctly output. Remove a duplicate command, reduce the priority of the daemon exit command to MSG so that the IOF will have a chance to output cached messages. Update the signal trapping test.
Thanks to Paul Kapinos for reporting the problem.

cmr=v1.7.5:reviewer=jsquyres:subject=resolve a race condition

This commit was SVN r30942.
2014-03-05 04:38:17 +00:00
Ralph Castain
a2b539c763 Per the telecon, silence the warning for 1.7.5 to give us time to consider a better permanent solution
Refs trac:4339

This commit was SVN r30941.

The following Trac tickets were found above:
  Ticket 4339 --> https://svn.open-mpi.org/trac/ompi/ticket/4339
2014-03-05 03:02:29 +00:00
Ralph Castain
50c30d62ca Repair builds without hwloc
cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r30940.
2014-03-05 02:48:15 +00:00
Adrian Reber
e5bef82ee1 OPAL_ENABLE_FT_CR: remove compiler warnings
When compiling --with-ft there are a few compiler warnings about
unused variables. This patch fixes those compiler warnings.

This commit was SVN r30927.
2014-03-04 15:28:07 +00:00
Ralph Castain
da4cb39683 If we can't find a route to communicate, emit an error message rather than just exiting with a non-zero status
cmr=v1.7.5:reviewer=jsquyres:subject=print error if cannot communicate

This commit was SVN r30922.
2014-03-04 04:57:53 +00:00
Ralph Castain
0ac97761cc Now that we are binding by default, the issue of #slots and what to do when oversubscribed has become a bit more complicated. This isn't a problem in managed environments as we are always provided an accurate assignment for the #slots, or when -host is used to define the allocation since we automatically assume one slot for every time a node is named.
The problem arises when a hostfile is used, and the user provides host names without specifying the slots= paramater. In these cases, we assign slots=1, but automatically allow oversubscription since that number isn't confirmed. We then provide a separate parameter by which the user can direct that we assign the number of slots based on the sensed hardware - e.g., by telling us to set the #slots equal to the #cores on each node. However, this has been set to "off" by default.

In order to make this a little less complex for the user, set the default such that we automatically set #slots equal to #cores (or #hwt's if use_hwthreads_as_cpus has been set) only for those cases where the user provides names in a hostfile but does not provide slot information.

Also cleanup some a couple of issues in the mapping/binding system:

* ensure we only override the binding directive if we are oversubscribed *and* overload is not allowed

* ensure that the MPI procs don't attempt to bind themselves if they are launched by an orted as any binding directive (no matter what it was) would have been serviced by the orted on launch

* minor cleanup to the warning message when oversubscribed and binding was requested

cmr=v1.7.5:reviewer=rhc:subject=update mapping/binding system

This commit was SVN r30909.
2014-03-03 16:46:37 +00:00
Ralph Castain
88b0e0cc6d Allow the user to turn off the oversubscribed-binding warning if overload-allowed has been provided
Refs trac:4317

This commit was SVN r30892.

The following Trac tickets were found above:
  Ticket 4317 --> https://svn.open-mpi.org/trac/ompi/ticket/4317
2014-02-28 17:55:53 +00:00
Ralph Castain
4a645f0342 Add detection of oversubscription with binding requested - if binding requested to core or hwt, warn and do not bind or else we will hurt performance. Also, if no binding directive was given, turn off the default binding
Refs trac:4317

This commit was SVN r30888.

The following Trac tickets were found above:
  Ticket 4317 --> https://svn.open-mpi.org/trac/ompi/ticket/4317
2014-02-28 16:08:52 +00:00
Ralph Castain
8500247c7b Fix the by-obj mapper in the case where slots are not specified, and so we are in a perpetual oversubscribed state
cmr=v1.7.5:reviewer=rhc

This commit was SVN r30887.
2014-02-28 05:21:46 +00:00
Ralph Castain
a4c3d0a5a0 Add some more debug to the by-obj mapper
This commit was SVN r30884.
2014-02-28 02:52:53 +00:00
Ralph Castain
d109c523b9 Per patch from Tetsuya Mishima, complete the overhaul of the round-robin mappers
Refs trac:4296

This commit was SVN r30861.

The following Trac tickets were found above:
  Ticket 4296 --> https://svn.open-mpi.org/trac/ompi/ticket/4296
2014-02-27 00:43:53 +00:00
Ralph Castain
61a21e4f31 Based on Tetsuya's patch, with some changes, correct the case of map-by node where multiple cpus/rank are requested and result in a non-integer match with num slots. Also correct tests for binding policy given to use the proper macro.
Refs trac:4296

This commit was SVN r30857.

The following Trac tickets were found above:
  Ticket 4296 --> https://svn.open-mpi.org/trac/ompi/ticket/4296
2014-02-26 18:12:23 +00:00
Ralph Castain
b880aa46bd Update the map-by obj and map-by obj:span mappers to correct for errors in computing carryover across the nodes. Be a little less complex in the algorithm so it is easier to follow and debug.
Refs trac:4296

This commit was SVN r30826.

The following Trac tickets were found above:
  Ticket 4296 --> https://svn.open-mpi.org/trac/ompi/ticket/4296
2014-02-25 23:32:43 +00:00
Joshua Ladd
9ea9bec4ad Addressing Jeff's comments:
1. Changed rng_buff_t --> opal_rng_buff_t
2. All global variables obey the prefix rule
3. Old code has been removed 
4. Found a couple of unnecessary includes

Refs trac:4298

This commit was SVN r30807.

The following Trac tickets were found above:
  Ticket 4298 --> https://svn.open-mpi.org/trac/ompi/ticket/4298
2014-02-24 23:18:35 +00:00
Joshua Ladd
e39d9f4080 Per the RFC schedule, add an additive lagged Fibonacci parallel random number generator to OPAL. In order to use, please add the following header to your code: opal/util/alfg.h. See ompi/mca/btl/openib/connect/btl_openib_connect_udcm.c for an example how to seed with opal_srand and invoke the generator with opal_rand. This should be added to
cmr=v1.7.5:reviewer=rhc:subject=Add an OPAL RNG

This commit was SVN r30801.
2014-02-23 21:41:38 +00:00
Ralph Castain
c8112c1086 Loadbalancing across nodes (i.e., map-by node) wasn't working correctly - the algorithm relied on the nodes being defined in descending order of slots, or the numbe
r of slots remaing to be assigned being only one/node. Regardless, it didn't work for the case where nodes were defined in ascending order of slots.

Tetsuya's proposed patch didn't solve the problem for me, but it did correct the case where cpus/proc > 1. The final patch requires that we loop over the assignment
 algo until all procs are assigned or all nodes are filled - any remaining procs are then handled in the cleanup loop.

cmr=v1.7.5:reviewer=rhc:subject=fix map-by node for different cases

This commit was SVN r30798.
2014-02-22 16:39:41 +00:00
Adrian Reber
f17ec1ab10 ESS/BASE: orte-restart needs sstore
Running orte-restart requires an initialized sstore.
This opens the sstore component for FT builds just like
the snapc component.

This commit was SVN r30796.
2014-02-21 21:23:26 +00:00
Ralph Castain
0319d5fb19 Seeing some errors coming out of MTT on this component, so turn it off for now and will debug later
This commit was SVN r30789.
2014-02-21 16:31:52 +00:00
Mike Dubman
8d4592a94b rmaps/mindist: better error message
better error message when there is only one socket available

fixed by Elena, reviewed by Miked
cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30787.
2014-02-21 11:38:35 +00:00
Ralph Castain
5520d6971b We do have to track the origin of messages sent over usock as the daemon does route them back down, and we need to get the "sender" info correct. Also do a better job of dealing with simultaneous connections to avoid binding to a used socket.
Refs trac:4280

This commit was SVN r30781.

The following Trac tickets were found above:
  Ticket 4280 --> https://svn.open-mpi.org/trac/ompi/ticket/4280
2014-02-20 17:27:05 +00:00
Ralph Castain
63803f5e61 Fix the leader data for PMI direct-launch as well
This commit was SVN r30778.
2014-02-20 01:41:19 +00:00
Ralph Castain
418ca60776 Since we don't know the name of the local leader, store that info under our own name :-)
This commit was SVN r30777.
2014-02-20 01:39:52 +00:00
Ralph Castain
262c927778 Define a new key and store the process name of the local_rank=0 process on each node so that the MPI layer can retrieve it as desired.
This commit was SVN r30759.
2014-02-18 00:32:58 +00:00
Adrian Reber
6b45d475e9 Fix compiler warnings when compiling with --with-ft
With enabled fault tolerance code different functions
are selected during compilation. Most of the ft
code is #ifdef'd out. This #ifdef's more code out
so that compiler warnings like 

warning: unused variable 'item' [-Wunused-variable]
     opal_list_item_t *item;

are removed.

This commit was SVN r30747.
2014-02-17 10:53:44 +00:00
Ralph Castain
c3df744a3b Shift the orte_db_localrank key to the opal level. Add the job and proc-level session directory names to the database using opal_db keys.
This commit was SVN r30746.
2014-02-17 01:40:56 +00:00
Ralph Castain
ea0217c337 Remove unused file and minimize the usock uri contribution (add explanation as to why)
Refs trac:4280

This commit was SVN r30744.

The following Trac tickets were found above:
  Ticket 4280 --> https://svn.open-mpi.org/trac/ompi/ticket/4280
2014-02-16 22:37:30 +00:00
Ralph Castain
a91d358c48 Add/modify a couple of tests
This commit was SVN r30743.
2014-02-16 20:54:34 +00:00
Ralph Castain
d42f4be8a4 Add unix socket component to OOB - no longer require active network for local operations. Demonstrate inter-transport crossover.
VERY tentatively schedule this for 1.7.5 - only to be applied if we see no troubles AND the branch is ready in advance.

cmr=v1.7.5:reviewer=rhc:subject=Add unix socket component to OOB

This commit was SVN r30742.
2014-02-16 20:54:12 +00:00
Ralph Castain
14bb7a117c Fix bugs in the oob base - ensure we get the components in high-to-low priority, and that we correctly track reachability via all components. Adjust the priority of the tcp component to leave headroom for others
Refs trac:267

This commit was SVN r30740.

The following Trac tickets were found above:
  Ticket 267 --> https://svn.open-mpi.org/trac/ompi/ticket/267
2014-02-16 03:19:08 +00:00
Ralph Castain
509d5d82b0 Add some verbage requested by Jeff, change the param level to something...?
Refs trac:4275

This commit was SVN r30736.

The following Trac tickets were found above:
  Ticket 4275 --> https://svn.open-mpi.org/trac/ompi/ticket/4275
2014-02-15 15:11:05 +00:00
Ralph Castain
3f9db36e0d Make Jeff smile - pretty-up the indentation
Refs trac:4267

This commit was SVN r30733.

The following Trac tickets were found above:
  Ticket 4267 --> https://svn.open-mpi.org/trac/ompi/ticket/4267
2014-02-14 23:25:48 +00:00
Ralph Castain
91f90058ce Add missing options and cleanup the code a bit. Default to by-slot ranking if a non-hardware option isn't given. Thanks to Tetsuya Mishima for the assist.
cmr=v1.7.5:reviewer=ompi-gk1.7

This commit was SVN r30725.
2014-02-14 10:23:16 +00:00
Ralph Castain
fd9b301a8b Check equality instead of bit-mask - thanks to Tetsuya Mishima for reporting it
cmr=v1.7.5:reviewer=ompi-gk1.7

This commit was SVN r30722.
2014-02-14 02:34:42 +00:00
Ralph Castain
4e1c07cbf2 If we are given a TCP oob address that doesn't match any active module, it is still possible that we could route to the address if a router is in the system. No harm in trying, so arbitrarily pick the first connection in the active module list and assign the peer to it. If that module can't reach it, we'll follow the usual failover mechanism until finally concluding that nobody can get there.
cmr=v1.7.5:reviewer=jsquyres:subject=handle non-matching addresses

This commit was SVN r30719.
2014-02-13 23:37:22 +00:00
Ralph Castain
449cd8f3d7 Update a couple of fields, add a scheduler field to proc_info
This commit was SVN r30718.
2014-02-13 23:30:04 +00:00
Ralph Castain
fc6101b508 Handle "localhost" better
Refs trac:4263

This commit was SVN r30702.

The following Trac tickets were found above:
  Ticket 4263 --> https://svn.open-mpi.org/trac/ompi/ticket/4263
2014-02-12 20:30:39 +00:00
Ralph Castain
a8a9801a0b Ensure an orted exits with non-zero status if it is unable to send a message. Add more diagnostic messages to the OOB set_addr code
cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r30701.
2014-02-12 19:44:01 +00:00
Ralph Castain
1473dde6ea Okay, once again be caught by the blasted hwloc inability to cleanly handle caches. Protect the calls to get_depth by first checking to see if it is a "cache", then use a cache-specific function to get the stupid data. Very, very irritating.
cmr=v1.7.5:reviewer=jsquyres:subject=treat caches as something different yet again

This commit was SVN r30693.
2014-02-12 01:45:06 +00:00
Ralph Castain
1565816988 Do a little better job of cleaning up the session directory left by mpirun by ensuring we delete the event associated with debugger attachment and unlinking the pipe used for that purpose. Also, we no longer leave "abort" files around, so remove that check when deleting session directory trees
cmr=v1.7.5:reviewer=jsquyres:subject=cleanup session directories better

This commit was SVN r30689.
2014-02-11 22:16:17 +00:00
Ralph Castain
fa7b686ccc Provide better messages when we don't find any included interfaces, and/or don't find any interfaces for use by OOB.
cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r30675.
2014-02-11 19:29:03 +00:00
Ralph Castain
b566cd5e30 Protect against no modifiers
Refs trac:4117

This commit was SVN r30672.

The following Trac tickets were found above:
  Ticket 4117 --> https://svn.open-mpi.org/trac/ompi/ticket/4117
2014-02-11 17:34:37 +00:00
Ralph Castain
6fa34407bf Handle modifiers to the --map-by dist option
Refs trac:4117

This commit was SVN r30671.

The following Trac tickets were found above:
  Ticket 4117 --> https://svn.open-mpi.org/trac/ompi/ticket/4117
2014-02-11 17:19:05 +00:00
Ralph Castain
4781ea71b6 Correct the handling of various map/bind combinations when pe=N is given. Thanks to Elena Elkina for reporting it.
Refs trac:4117

This commit was SVN r30663.

The following Trac tickets were found above:
  Ticket 4117 --> https://svn.open-mpi.org/trac/ompi/ticket/4117
2014-02-11 03:05:26 +00:00
Ralph Castain
707e51d786 Check for --cpus-per-proc earlier, before the correct option can be processed. Thanks to Tetsuya Mishima for reporting it.
Refs trac:4117

This commit was SVN r30662.

The following Trac tickets were found above:
  Ticket 4117 --> https://svn.open-mpi.org/trac/ompi/ticket/4117
2014-02-11 02:53:53 +00:00
Ralph Castain
d66d2f5fb3 It is just fine to map by node or slot and bind, so ensure the switch statement includes those options. Thanks to Tatsuya Mishima for point it out.
Refs trac:4240

This commit was SVN r30661.

The following Trac tickets were found above:
  Ticket 4240 --> https://svn.open-mpi.org/trac/ompi/ticket/4240
2014-02-11 02:52:01 +00:00
Ralph Castain
a49e0db8dd We haven't supported a c++ wrapper for ORTE in quite some time
cmr=v1.7.5:reviewer=ompi-gk1.7:subject=remove c++ cruft

This commit was SVN r30653.
2014-02-10 17:16:30 +00:00
Ralph Castain
1a12325094 Rats - need to include bydist in the mapping list
Refs trac:4117

This commit was SVN r30649.

The following Trac tickets were found above:
  Ticket 4117 --> https://svn.open-mpi.org/trac/ompi/ticket/4117
2014-02-09 16:17:05 +00:00
Ralph Castain
0dc5f50d27 Add a plm component for local-only operation that doesn't require rsh/ssh to be installed. Requested by Fedora packagers for testing purposes.
cmr=v1.7.5:reviewer=jsquyres:subject=Add a plm component for local-only operation

This commit was SVN r30645.
2014-02-09 15:53:10 +00:00
Ralph Castain
ca0c806662 Resolve the problem of binding in inverted topologies - check the relative depth of the map and bind objects in the topology, and let that determine whether we bind downward or upwards.
cmr=v1.7.5:reviewer=jsquyres:subject=Resolve the problem of binding in inverted topologies

This commit was SVN r30643.
2014-02-09 05:30:17 +00:00
Ralph Castain
0ee38353ba In case there are stale session directories around, do a purge of the relevant session directory tree when an orted, HNP, or singleton start. This won't help in the case of direct-launched apps, but it's the best we can do.
cmr=v1.7.5:reviewer=jsquyres:subject=purge stale session dirs at startup

This commit was SVN r30642.
2014-02-09 02:10:31 +00:00
Ralph Castain
1d8c061687 Fix a race condition that could result in assert failures during finalize. Ensure we shutdown the orte progress thread prior to finalizing the rml/oob frameworks so that no async operations are executing during destruct of the base-level lists and objects.
cmr=v1.7.5:reviewer=jsquyres:subject=fix race condition in finalize

This commit was SVN r30641.
2014-02-08 22:04:19 +00:00
Ralph Castain
5b8e1180cf Update a test
This commit was SVN r30640.
2014-02-08 22:00:12 +00:00
Ralph Castain
a94920276d Fix singleton MPI_Abort. Singletons no longer immediately start an HNP, but only launch one when they need it for comm_spawn. So there isn't anyone to send the "abort" report to, and thus we just exit after emitting our message.
cmr=v1.7.5:reviewer=jsquyres:subject=Fix singleton MPI_Abort

This commit was SVN r30635.
2014-02-08 18:15:07 +00:00
Ralph Castain
bc7cc09749 After a lot of pain, I've managed to resolve the problem of conflicting mapping directives caused by mismatched MCA params - i.e., where someone has one variant of an MCA param (e.g., rmaps_base_mapping_policy) in their default MCA param file, and then specifies another variant (e.g., --npernode) on the command line. I can't fully resolve the problem as there is no way to know precisely what the user meant - we can only guess which param was really intended since the MCA param system
can't apply its normal precedence rules.

So...print a big "deprecated" warning for the old params and error out if a conflict is detected. I know that isn't what people really wanted, but it's the best we
 can do. If only the old style param is given, then process it after the warning.

Extend the current map-by param to add support for ppr and cpus-per-proc, adding the latter to the list of allowed modifiers using "pe=n" for processing elements/proc. Thus, you can map-by socket:pe=2,oversubscribe to map by socket, binding 2 processing elements/process, with oversubscription allowed. Or you can map-by ppr:2:socket:pe=4 to map two processes to every socket in the allocation, binding each process to 4 processing elements.

For those wondering, a processing element is defined as a hwthread if --use-hwthreads-as-cpus is given, or else as a core.

Refs trac:4117

This commit was SVN r30620.

The following Trac tickets were found above:
  Ticket 4117 --> https://svn.open-mpi.org/trac/ompi/ticket/4117
2014-02-07 21:25:40 +00:00
Ralph Castain
c617d66d98 Paul Hargrove has pointed out that some big SMP systems (e.g., from SGI) configure Torque differently - instead of listing each node name once/slot in the nodefile, they list the node only once and set an envar to indicate the number of procs/node being allocated. Add an MCA param users can set to indicate we are in such an environment, and then use the envar to set the slots. Error out if the mode flag is given, but (a) we don't find the PBS_PPN envar, or (b) we find a node actually listed more than once in the PBS_Nodefile.
cmr=v1.7.5:reviewer=jsquyres:subject=Support SMP mode in Torque

This commit was SVN r30568.
2014-02-05 15:51:17 +00:00
Ralph Castain
1326ed704f Per the RFC discussed here:
http://www.open-mpi.org/community/lists/devel/2014/01/13789.php

add support for async modex when requested.

cmr=v1.7.5:reviewer=jsquyres:subject=Add async modex support

This commit was SVN r30565.
2014-02-05 14:39:27 +00:00
Ralph Castain
230336b6a8 Upgrade the security framework to avoid multiple hits against the global security server. Add support for future case where mpirun assings a global security credential for a given run, though we need to work out how to handle connect-accept from other mpirun's in that case. Remove a bunch of duplicate code in the OOB by consolidating the connection handshake code.
Refs trac:4221

This commit was SVN r30554.

The following Trac tickets were found above:
  Ticket 4221 --> https://svn.open-mpi.org/trac/ompi/ticket/4221
2014-02-04 14:47:04 +00:00
Adrian Reber
fde1040d2f Use unique collective ids for the checkpoint/restart code
This commit was SVN r30552.
2014-02-04 14:03:05 +00:00
Ralph Castain
5980b7e042 Add a security framework for authenticating connections - we will add LDAP, Kerberos, and Keystone support in the next month. For now, just put a placeholder "basic" module that does the minimum.
Wire the security check into ORTE's OOB handshake, and add a "version" check to ensure that both ends are from the same ORTE version. If not, report the mismatch and refuse the connection

Fixes trac:4171

cmr=v1.7.5:reviewer=jsquyres:subject=Add a security framework for authenticating connections

This commit was SVN r30551.

The following Trac tickets were found above:
  Ticket 4171 --> https://svn.open-mpi.org/trac/ompi/ticket/4171
2014-02-04 01:38:45 +00:00
Ralph Castain
e43589ed84 Fix warning - thanks to Paul Hargrove for reporting it
cmr=v1.7.4:reviewer=ompi-gk1.7

This commit was SVN r30548.
2014-02-03 23:51:45 +00:00
Ralph Castain
993198cfba Fix lost message problem - if multiple messages are queued before the connection is formed, we lost all but the first one. Ensure that all messages get properly queued prior to completing the connection
cmr=v1.7.4:reviewer=jsquyres:subject=Fix lost message problem

This commit was SVN r30516.
2014-01-31 05:30:51 +00:00
Ralph Castain
2bc9fd30ee Orcm sends heartbeats to its daemons, but ORTE needs to continue sending it to the HNP
This commit was SVN r30514.
2014-01-31 01:56:01 +00:00
Ralph Castain
193cceb483 Okay, since a certain other RM out there made a fuss about being able to lock their daemons to specified cores, offer the same option here. The MCA param orte_daemon_cores can be used to specify which core(s) you want the orte daemons to use. This will have no bearing on the application procs - unbound will remain unbound, and binding directives will be applied to the apps.
Yippee skippee...

This commit was SVN r30513.
2014-01-30 23:50:14 +00:00
Rolf vandeVaart
f7055de78e Stop listening thread and wait for it to terminate.
This commit was SVN r30507.
2014-01-30 20:37:15 +00:00
Ralph Castain
83e32aadb7 Add a variant of opal_init/finalize for running unit tests
This commit was SVN r30497.
2014-01-30 11:14:36 +00:00
Ralph Castain
db92ac3ce1 Cleanup role of aggregator relative to daemons
Refs trac:4176

This commit was SVN r30495.

The following Trac tickets were found above:
  Ticket 4176 --> https://svn.open-mpi.org/trac/ompi/ticket/4176
2014-01-30 00:53:30 +00:00
Ralph Castain
ed3da20672 Add unit test for opal_db
This commit was SVN r30494.
2014-01-30 00:51:44 +00:00
Adrian Reber
af934fc6e8 removed trailing whitespaces in snapc
This commit was SVN r30489.
2014-01-29 21:27:13 +00:00
Adrian Reber
7de34ea201 SNAPC/CRCP/SSTORE: remove compiler warnings
This commit was SVN r30488.
2014-01-29 20:52:00 +00:00
Adrian Reber
5f95db3902 SNAPC: use ORTE_WAIT_FOR_COMPLETION with non-blocking receives
During the commits to make the C/R code compile again the
blocking receive calls in snapc_full_app.c were
replaced by non-blocking receive calls.
This commit adds ORTE_WAIT_FOR_COMPLETION()
after each non-blocking receive to wait for the data.

This commit was SVN r30487.
2014-01-29 20:46:14 +00:00
Adrian Reber
fa1036f38c SSTORE/CRCP: use ORTE_WAIT_FOR_COMPLETION with non-blocking receives
During the commits to make the C/R code compile again the
blocking receive calls were replaced by non-blocking
which broke the code. This patch uses ORTE_WAIT_FOR_COMPLETION()
to wait until the non-blocking calls have finished.

This commit was SVN r30486.
2014-01-29 20:30:35 +00:00
Adrian Reber
d5c1e33900 SSTORE: use dynamic buffers for rml.send and rml.recv
The sstore component was still using static buffers
for send_buffer_nb(). This patch changes opal_buffer_t buffer;
to opal_buffer_t *buffer;

This commit was SVN r30485.
2014-01-29 20:06:23 +00:00
Adrian Reber
2900f24b67 SNAPC: use dynamic buffers for rml.send and rml.recv
The snapc component was still using static buffers
for send_buffer_nb(). This patch changes opal_buffer_t buffer;
to opal_buffer_t *buffer;

This commit was SVN r30484.
2014-01-29 19:58:33 +00:00
Ralph Castain
4e3d12d9c1 Fix suicide operation when MPI app loses connection to its local daemon. In that scenario, we correctly callback up to the MPI layer notifying it of the lost connection. However, when the MPI layer calls back down to tell the RTE to abort, it is passing back a flag indicating we should report that error to our local daemon - which is dead. This leads to an infinite loop. Break it by using checking the flag indicating an abnormal term was ordered by the RTE and thus don't attempt to send the message.
cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r30475.
2014-01-29 16:56:54 +00:00
Ralph Castain
410a3afa7b Fix --without-hwloc operations - must default to map-by slot in that scenario
cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r30474.
2014-01-29 16:54:05 +00:00
Ralph Castain
42eb0bbe1b Fix --without-hwloc builds
cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r30462.
2014-01-28 17:10:32 +00:00
Ralph Castain
c874ce3b61 Don't look for the coretemp file when configuring as it might not be on the head node, but is available on the backend
Refs trac:4176

This commit was SVN r30461.

The following Trac tickets were found above:
  Ticket 4176 --> https://svn.open-mpi.org/trac/ompi/ticket/4176
2014-01-28 16:15:12 +00:00
Jeff Squyres
4edeb229cc Add MPIEXEC_TIMEOUT environment variable to the man page.
cmr=v1.7.4:reviewer=rhc

This commit was SVN r30455.
2014-01-28 14:40:17 +00:00
Ralph Castain
84a0ab3a75 Ah @$#!$#% - missed one last help message that needs to be corrected.
cmr=v1.7.4:reviewer=jsquyres:subject=correct help message

This commit was SVN r30449.
2014-01-28 04:03:24 +00:00
Ralph Castain
941bfd4604 Final cleanup of cpus-per-proc for 1.7.4 - provide better checking for cpus-per-proc and mismatched mapping/binding directives, and provide error messages telling the user what to do to get it right.
cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r30438.
2014-01-27 22:40:51 +00:00
Ralph Castain
53b1be5067 Only report launch progress when specifically requested to do so. Thanks to Tetsuya Mishima for spotting it.
Reviewed by rhc and RM-approved

cmr=v1.7.4:reviewer=ompi-gk1.7

This commit was SVN r30434.
2014-01-27 15:17:42 +00:00
Ralph Castain
956aab03a7 Track the origin of a message so it can be passed across transports
Refs trac:4184

This commit was SVN r30433.

The following Trac tickets were found above:
  Ticket 4184 --> https://svn.open-mpi.org/trac/ompi/ticket/4184
2014-01-26 21:09:26 +00:00
Ralph Castain
11562ab7cb Ensure we build the sensor components even if the local system doesn't have the required directories and/or access permissions. Backend nodes that get the binary may have them, and aggregators need to load the component so they can log data even if they aren't locally monitoring. Detect that we can't access the required files when we first try to sample and turn the sampling portion of the plugin off at that time.
Refs trac:4172

This commit was SVN r30426.

The following Trac tickets were found above:
  Ticket 4172 --> https://svn.open-mpi.org/trac/ompi/ticket/4172
2014-01-25 04:34:33 +00:00
Jeff Squyres
21ffddbbd0 Addendum to r30408: if we're going to remove stale kruft, let's remove
all of it.  :-)

Refs trac:4175.

This commit was SVN r30417.

The following SVN revision numbers were found above:
  r30408 --> open-mpi/ompi@31acdb15bc

The following Trac tickets were found above:
  Ticket 4175 --> https://svn.open-mpi.org/trac/ompi/ticket/4175
2014-01-24 22:19:36 +00:00
Ralph Castain
f73d23e723 Correct the location of the counter when tracking process launch for reporting progress
cmr=v1.7.4:reviewer=hjelmn

This commit was SVN r30415.
2014-01-24 21:03:05 +00:00
Ralph Castain
e3cb4b4a5b Grant Nathan his wish - add an --disable-getpwuid to the configure options and protect all users of that code so it disappears if disabled.
cmr=v1.7.5:reviewer=hjelmn:subject=disable getpwuid if requested

This commit was SVN r30413.
2014-01-24 19:18:37 +00:00
Ralph Castain
e496e348a4 Some cleanup of the sensor system to ensure things go in the right place, avoid segfaults under abnormal conditions, etc.
cmr=v1.7.5:reviewer=rhc

This commit was SVN r30409.
2014-01-24 17:29:24 +00:00
Ralph Castain
31acdb15bc We haven't really supported orteCC in a long time, so let's remove the stale cruft. Thanks to Paul Hargrove for noticing!
cmr=v1.7.4:reviewer=jsquyres:subject=remove stale orteCC cruft

This commit was SVN r30408.
2014-01-24 17:26:54 +00:00
Adrian Reber
0af2897c12 removed trailing whitespaces in orte-checkpoint.c
This commit was SVN r30407.
2014-01-24 17:23:49 +00:00
Adrian Reber
659eb1b10a silence two compiler warnings
This commit was SVN r30406.
2014-01-24 17:22:28 +00:00
Adrian Reber
919260a0d2 fix communication between orte-checkpoint and orterun
Right after starting the communication with orterun the buffer
containing the message is deleted. This patch removes the deletion
of the buffer which is now done by orte_rml_send_callback(). This is
now also the callback function used by orte_rml.send_buffer_nb().
The previous callback hnp_receiver() was introduced by an
earlier patch which only was trying to get the code to compile again.

This commit was SVN r30405.
2014-01-24 17:18:28 +00:00
Adrian Reber
8c93ebffeb orte_snapc_base_select() wants to know if it is an application
The function

   int orte_snapc_base_select(bool seed, bool app);

wants to know if it called by an application or not. Therefore
it expects as second paremeter 'bool app'. It used to be
'!ORTE_PROC_IS_DAEMON' which is not always correct if it is
a tool or a HNP. This patch changes it to ORTE_PROC_IS_APP, which
has the correct information if it is an application.

This commit was SVN r30404.
2014-01-24 17:14:41 +00:00
Ralph Castain
14bf1c9463 Some minor cleanups:
* don't return null if someone wants to print ORTE_SUCCESS

* rename some stale process types

* keep show_help local if we are in standalone operation as there is nobody to send it to

cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r30400.
2014-01-23 21:35:20 +00:00
Ralph Castain
32996cd705 Add new sensors for chip frequency and power (when permissions allow) Note that we don't support all chipsets at this time, but others are welcome to extend as desired.
cmr=v1.7.5:reviewer=rhc

This commit was SVN r30399.
2014-01-23 21:33:21 +00:00
Ralph Castain
886fee9367 Properly set num_procs when np is not given, but cpus-per-proc is used. Thanks to Tetsuya Mishima for pointing it out
cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r30389.
2014-01-23 05:01:07 +00:00
Ralph Castain
a01470190d Allow a little more flexibility - if getpwuid fails, just use the return from getuid to define the session directory
cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r30388.
2014-01-23 05:00:05 +00:00
Ralph Castain
de07a64599 Cleanup the sensor code:
* use the global flags for linux and apple being found instead of re-doing the case statements

* update select procedure to ignore components that measure the same thing (e.g., resusage and sigar), taking the higher priority module

cmr=v1.7.5:reviewer=jsquyres:subject=Cleanup the sensor code

This commit was SVN r30368.
2014-01-22 21:01:09 +00:00
Jeff Squyres
7768828d2d Addendum to r30298: tweak the wording of the help messages a bit.
Refs trac:4117.  Please use this commit rather than the patch attached to
the ticket; the patch had a few mistakes in the tweaked wording.

This commit was SVN r30362.

The following SVN revision numbers were found above:
  r30298 --> open-mpi/ompi@58479399c3

The following Trac tickets were found above:
  Ticket 4117 --> https://svn.open-mpi.org/trac/ompi/ticket/4117
2014-01-22 12:17:14 +00:00
Ralph Castain
e0edc29029 Add comment on future work
This commit was SVN r30336.
2014-01-20 19:54:31 +00:00
Ralph Castain
9b2066cfba Add two new sensor modules - one to monitor core temperatures, and the other to monitor resource usage using the sigar library
This commit was SVN r30335.
2014-01-20 19:35:48 +00:00
Ralph Castain
3e9c8497e0 Shift the verbose output a bit
Refs trac:4136

This commit was SVN r30332.

The following Trac tickets were found above:
  Ticket 4136 --> https://svn.open-mpi.org/trac/ompi/ticket/4136
2014-01-20 14:41:37 +00:00
Ralph Castain
5ad9795bd8 Cleanup some potential memory overruns
cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r30331.
2014-01-19 16:31:26 +00:00
Ralph Castain
9f6fd7b98d A few corrections to hostfile parsing - thanks to Tetsuya Mishima for the review
Refs trac:4136

This commit was SVN r30330.

The following Trac tickets were found above:
  Ticket 4136 --> https://svn.open-mpi.org/trac/ompi/ticket/4136
2014-01-19 16:26:12 +00:00
Ralph Castain
657796f9e0 Revert r30327 - turns out it isn't quite right just yet. :-(
Closes trac:4138

This commit was SVN r30328.

The following SVN revision numbers were found above:
  r30327 --> open-mpi/ompi@87d5f86025

The following Trac tickets were found above:
  Ticket 4138 --> https://svn.open-mpi.org/trac/ompi/ticket/4138
2014-01-18 23:38:39 +00:00
Ralph Castain
87d5f86025 Enable use of unix domain sockets for local OOB communications, thereby removing the requirement for an active network interface when running strictly on a single node. Update the overall OOB system to support cross-transport movement of messages so that the OOB can move a received message to another transport for transmission.
cmr=v1.7.5:reviewer=jsquyres:subject=Enable use of unix domain sockets for local OOB communications

This commit was SVN r30327.
2014-01-18 21:36:49 +00:00
Ralph Castain
fcdd904af4 Simplify and update hostfile handling to correctly support hostfiles that list nodes multiple times, once for each slot, and those that list a host once and include an explicit slot count. Eliminate support for mixing those two modes as this logic became just too complex when attempting to handle all the corner cases.
cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r30325.
2014-01-18 16:08:40 +00:00
Ralph Castain
87f34860fe Protect array against crossing boundaries
cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r30316.
2014-01-17 21:36:20 +00:00
Mike Dubman
874c4e2558 PMI2: add missing file from prev commit
Refs trac:4119

This commit was SVN r30301.

The following Trac tickets were found above:
  Ticket 4119 --> https://svn.open-mpi.org/trac/ompi/ticket/4119
2014-01-16 13:17:08 +00:00
Mike Dubman
98234b5a69 SLURM/PMI2: Fix parsing of PMI2 process mapping
fixed by AlexM, reviewed by miked
cmr=v1.7.4:reviewer=rhc

This commit was SVN r30300.
2014-01-16 12:05:29 +00:00
Ralph Castain
58479399c3 As per RFC and telecon, deprecate cmd line options and their corresponding MCA params for old-style mapping and binding directives
cmr=v1.7.5:reviewer=jsquyres:subject=deprecate old-style mapping and binding directives

This commit was SVN r30298.
2014-01-15 14:48:39 +00:00
Ralph Castain
590a87c730 You can't pass static buffer definitions to rml.send as it will attempt to release them upon completion - you need to send dynamically allocated buffers
This commit was SVN r30261.
2014-01-11 19:38:11 +00:00
Ralph Castain
286ff6d552 For large scale systems, we would like to avoid doing a full modex during MPI_Init so that launch will scale a little better. At the moment, our options are somewhat limited as only a few BTLs don't immediately call modex_recv on all procs during startup. However, for those situations where someone can take advantage of it, add the ability to do a "modex on demand" retrieval of data from remote procs when we launch via mpirun.
NOTE: launch performance will be absolutely awful if you do this with BTLs that aren't configured to modex_recv on first message!

Even with "modex on demand", we still have to do a barrier in place of the modex - we simply don't move any data around, which does reduce the time impact. The barrier is required to ensure that the other proc has in fact registered all its BTL info and therefore is prepared to hand over a complete data package. Otherwise, you may not get the info you need. In addition, the shared memory BTL can fail to properly rendezvous as it expects the barrier to be in place.

This behavior will *only* take effect under the following conditions:

1. launched via mpirun

2. #procs is greater than ompi_hostname_cutoff, which defaults to UINT32_MAX

3. mca param rte_orte_direct_modex is set to 1. At the moment, we are having problems getting this param to register properly, so only the first two conditions are in effect. Still, the bottom line is you have to *want* this behavior to get it.

The planned next evolution of this will be to make the direct modex be non-blocking - this will require two fixes:

1. if the remote proc doesn't have the required info, then let it delay its response until it does. This means we need a way for the MPI layer to tell the RTE "I am done entering modex data".

2. adjust the SM rendezvous logic to loop until the required file has been created

Creating a placeholder to bring this over to 1.7.5 when ready.

cmr=v1.7.5:reviewer=hjelmn:subject=Enable direct modex at scale

This commit was SVN r30259.
2014-01-11 17:36:06 +00:00
Ralph Castain
fb9e427320 One last corner case - when encountering an overload condition (e.g., by comm_spawning more procs than we have cores) and we are using the default binding policy, do *not* bind the new procs to anything as this can cause major problems. Instead, let the spawn succeed since the user didn't specifically ask to be bound, and leave the new procs as unbound.
Refs trac:4077

This commit was SVN r30200.

The following Trac tickets were found above:
  Ticket 4077 --> https://svn.open-mpi.org/trac/ompi/ticket/4077
2014-01-09 22:39:34 +00:00
Ralph Castain
24e990e747 Fix comm_spawn for oversubscribed systems by correctly computing the number of available slots
cmr=v1.7.4:reviewer=jsquyres:subject=Fix comm_spawn for oversubscribed systems

This commit was SVN r30197.
2014-01-09 20:33:48 +00:00
Ralph Castain
9fcb46d85a Correctly detect and handle oversubscription for comm_spawn
cmr=v1.7.4:reviewer=jsquyres:subject=Correctly detect and handle oversubscription for comm_spawn

This commit was SVN r30186.
2014-01-09 18:27:51 +00:00
Ralph Castain
6e5fedeb04 Oops - add verbose output to inform that cannot default bind due to no cores detected
Refs trac:4074

This commit was SVN r30185.

The following Trac tickets were found above:
  Ticket 4074 --> https://svn.open-mpi.org/trac/ompi/ticket/4074
2014-01-09 18:17:14 +00:00
Ralph Castain
4cdc291df1 Ensure slurm properly dies on abnormal termination
cmr=v1.7.4:reviewer=jsquyres:subject=Ensure slurm properly dies on abnormal termination

This commit was SVN r30182.
2014-01-09 16:52:02 +00:00
Jeff Squyres
87e476ebd8 Clean up many references to "rank": usually change to "process" and/or
specifically delineate that we're referring to the process' rank in
MPI_COMM_WORLD.

Refs trac:4068

This commit was SVN r30181.

The following Trac tickets were found above:
  Ticket 4068 --> https://svn.open-mpi.org/trac/ompi/ticket/4068
2014-01-09 16:37:49 +00:00
Ralph Castain
7e4748a0f1 Handle the case of nodes that do not report cores, and thus our default binding policy will fail even though binding is supported by defaulting to not binding on those nodes.
Thanks to Paul Hargrove for reporting the problem on NetBSD.

cmr=v1.7.4:reviewer=jsquyres:subject=Handle the case of nodes that do not report cores

This commit was SVN r30180.
2014-01-09 16:27:58 +00:00
Ralph Castain
f179f2086b Do a better job of reporting bindings - if someone gives a spec that binds us to all processors, then we are effectively unbound and should report it clearly instead of outputting a long line of B's.
cmr=v1.7.4:reviewer=jsquyres:subject=Do a better job of reporting bindings

This commit was SVN r30179.
2014-01-09 16:16:16 +00:00
Ralph Castain
2a0e4b5e62 Update the orterun help messages and man page to reflect new map/rank/bind options and defaults. Thanks to Paul Hargrove for reporting it.
cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r30173.
2014-01-09 04:44:28 +00:00
Ralph Castain
bf453a2575 Reference the correct variable...sigh
Refs trac:4059

This commit was SVN r30163.

The following Trac tickets were found above:
  Ticket 4059 --> https://svn.open-mpi.org/trac/ompi/ticket/4059
2014-01-08 22:36:39 +00:00
Ralph Castain
80497d73cf Need to mark the daemon as alive so that exit commands are properly routed during abnormal terminations. Also, remove stale references to the "selected oob component" as we no longer require only one component be selected
cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r30162.
2014-01-08 22:35:48 +00:00
Ralph Castain
d5647394d8 Initialize variable so dash-host option gets correctly parsed
cmr=v1.7.4:reviewer=rolfv

This commit was SVN r30159.
2014-01-08 15:17:16 +00:00
Ralph Castain
e724d0d12d Ensure comm_spawn'd jobs get treated the same wrt setting default mapping directives
Refs trac:4059

This commit was SVN r30158.

The following Trac tickets were found above:
  Ticket 4059 --> https://svn.open-mpi.org/trac/ompi/ticket/4059
2014-01-08 15:16:22 +00:00
Ralph Castain
fb650aed0c Fix how we transfer mapping directives to the job, ensuring that directives that can be given outside of a mapping policy (e.g., oversubscribe and no-use-local) are retained.
cmr=v1.7.4:reviewer=jsquyres:subject=Fix how we transfer mapping directives to the job

This commit was SVN r30155.
2014-01-08 04:25:43 +00:00
Ralph Castain
bc75250951 Cleanup the sensor framework close - existing code was using incorrect object type. Don't start sensors if sample rate is zero. Don't add zero-byte data from resusage as it means nothing was measured.
cmr=v1.7.4:reviewer=hjelmn

This commit was SVN r30150.
2014-01-08 02:38:56 +00:00
Jeff Squyres
13b29cff2c This commit compliements/completes r30140. r30140 made all the
configury/Makefile.am changes; this commit renames the internal
installdirs.h framework struct field names to match the configry macro
names:

 * pkgdatdir ->	ompidatadir
 * pkglibdir -> ompilibdir
 * pkgincludedir -> ompiincludedir

This commit was SVN r30145.

The following SVN revision numbers were found above:
  r30140 --> open-mpi/ompi@8b778903d8
2014-01-07 23:36:33 +00:00
Brian Barrett
8b778903d8 Fix longstanding issue with our multi-project support. Rather than using
pkg{data,lib,includedir}, use our own ompi{data,lib,includedir}, which is
always set to {datadir,libdir,includedir}/openmpi.  This will keep us from
having help files in prefix/share/open-rte when building without Open MPI,
but in prefix/share/openmpi when building with Open MPI.

This commit was SVN r30140.
2014-01-07 22:11:15 +00:00
Mike Dubman
40aadab85f re-enable map-by dist
after last refactoring in rmaps, map-by dist:hca  was disabled.
reverting it back

found/fixed by Elena, reviewed by miked

cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r30118.
2014-01-04 20:44:41 +00:00
Ralph Castain
9a855ff58e Update sensor component for new OOB calls
cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r30117.
2014-01-03 22:37:15 +00:00
Ralph Castain
3f2b3c53ea Ensure that rankfile-provided allocations are correctly handled
Fixes trac:4043

cmr=v1.7.4:reviewer=jsquyres:subject=Ensure that rankfile-provided allocations are correctly handled

This commit was SVN r30106.

The following Trac tickets were found above:
  Ticket 4043 --> https://svn.open-mpi.org/trac/ompi/ticket/4043
2014-01-02 16:07:16 +00:00
Ralph Castain
d5a5caa7e0 Restore the bycore mpirun option for backward compatibility
Refs trac:4044

cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r30103.

The following Trac tickets were found above:
  Ticket 4044 --> https://svn.open-mpi.org/trac/ompi/ticket/4044
2014-01-02 04:16:43 +00:00
Ralph Castain
a8a91b374e Update component-level selection comments to match latest revisions
cmr=v1.7.4:reviewer=rhc

This commit was SVN r30087.
2013-12-25 19:12:43 +00:00
Ralph Castain
d049731911 Add pubsub pmi component to list of components to avoid when indirect launch used
Refs trac:4032

This commit was SVN r30083.

The following Trac tickets were found above:
  Ticket 4032 --> https://svn.open-mpi.org/trac/ompi/ticket/4032
2013-12-25 16:25:37 +00:00
Ralph Castain
85f2429819 Ensure the ipv6 lists get initialized and finalized
cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r30081.
2013-12-24 17:24:39 +00:00
Ralph Castain
2e08219cac Silence the valgrind report from the OOB
Refs trac:4033

This commit was SVN r30080.

The following Trac tickets were found above:
  Ticket 4033 --> https://svn.open-mpi.org/trac/ompi/ticket/4033
2013-12-24 17:06:45 +00:00
Ralph Castain
81df8d09ca Avoid use of PMI components when launched via mpirun as this is just unnecessary overhead that can cause confusion.
cmr=v1.7.4:reviewer=miked:subject=Avoid use of PMI components when launched via mpirun

This commit was SVN r30078.
2013-12-24 16:32:31 +00:00
Ralph Castain
01ee5f380b Remove debug - problem has been identified
Refs trac:4026

This commit was SVN r30075.

The following Trac tickets were found above:
  Ticket 4026 --> https://svn.open-mpi.org/trac/ompi/ticket/4026
2013-12-24 15:22:18 +00:00
Jeff Squyres
ce02002a5e Free minor memory leak / squash valgrind still-reachable warning.
cmr=v1.7.5:reviewer=rhc

This commit was SVN r30071.
2013-12-24 11:04:38 +00:00
Ralph Castain
38f46641ce Ensure the recv handler has been initialized
Refs trac:4026

This commit was SVN r30068.

The following Trac tickets were found above:
  Ticket 4026 --> https://svn.open-mpi.org/trac/ompi/ticket/4026
2013-12-24 06:09:45 +00:00
Ralph Castain
bb80625a8a Add missing var initialization
cmr=v1.7.4:reviewer=ompi-gk1.7

This commit was SVN r30063.
2013-12-24 00:02:22 +00:00
Ralph Castain
65228d3571 Don't use "size_t" for the nbytes field in the header - use uint32_t to ensure that ntohl/htonl correctly match it
Refs trac:4026

This commit was SVN r30062.

The following Trac tickets were found above:
  Ticket 4026 --> https://svn.open-mpi.org/trac/ompi/ticket/4026
2013-12-23 21:39:49 +00:00
Ralph Castain
7d8c0459a4 Attempt to debug hang that is hitting some environments. Posting to 1.7.4 as a placeholder for the eventual solution
cmr=v1.7.4:reviewer=rhc

This commit was SVN r30060.
2013-12-23 19:57:05 +00:00
Nathan Hjelm
3be4536d9b Cleanup various leaks in ompi_info reported by valgrind.
cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r30058.
2013-12-23 17:47:43 +00:00
George Bosilca
24879f9def Code cleanup while chasing valgrind complaints.
This commit was SVN r30048.
2013-12-21 23:28:14 +00:00
George Bosilca
38cbaeaa82 Try to impose a little bit of consistency on how we parse lists of
modules by enforcing the use of OPAL list accessors.

This commit was SVN r30045.
2013-12-21 23:23:33 +00:00
Ralph Castain
264150872b Add a bunch of debug output to the OOB connection completion code so we can track down a handshake problem. Available in optimized builds as well as debug ones by setting -mca oob_base_verbose 10
No review will be required as this is just debug code for those helping us debug the 1.7.4 release candidates

cmr-=v1.7.4:reviewer=ompi-gk1.7

This commit was SVN r30043.
2013-12-21 16:09:26 +00:00
Ralph Castain
9c768df8b8 Resolve an unexpected behavior in hostfile allocations. Now that we filter allocations to determine what will be used for mapping, let the initial global pool be the union of nodes from all sources (default hostfile, hostfiles, and dash-hosts). Each app will filter down to only those specified for it using its own hostfile and dash-host options.
cmr=v1.7.4:reviewer=jsquyres:subject=Resolve an unexpected behavior in hostfile allocations

This commit was SVN r30040.
2013-12-21 01:38:27 +00:00
Adrian Reber
53a70fe87f Trying to get the C/R code to compile again. (send_*_nb)
This patch changes all send/send_buffer occurrences in the C/R code
to send_nb/send_buffer_nb.
The new code compiles but does not work.

Changes from V1:
* #ifdef out the code (so it is preserved for later re-design)
* marked the broken C/R code with ENABLE_FT_FIXED

Changes from V2:
* just replace the blocking calls with the non-blocking calls
* all #ifdef's introduced in V1 are gone
* send_* returns error code or ORTE_SUCCESS (not the number of bytes)

This commit was SVN r30036.
2013-12-20 21:58:28 +00:00
Adrian Reber
a3813d37c7 Trying to get the C/R code to compile again. (recv_*_nb)
This patch changes all recv/recv_buffer occurrences in the C/R code
to recv_nb/recv_buffer_nb.
The old code is still there but disabled using ifdefs (ENABLE_FT_FIXED).
The new code compiles but does not work.

Changes from V1:
* #ifdef out the code (so it is preserved for later re-design)
* marked the broken C/R code with ENABLE_FT_FIXED

Changes from V2:
* only #ifdef out the code where the behaviour is changed
  (used to be blocking; now non-blocking)

This commit was SVN r30035.
2013-12-20 21:05:40 +00:00
Ralph Castain
31248c0985 Correctly add support for the "env" MPI_Info key during comm_spawn, update the "map-by", "rank-by", and "bind-to" Info key behaviors to match the new mapping/ranking/binding system, and update all docs and comments to match.
Fix comm_spawn on a single host - with the new default mapping scheme, we were incorrectly computing the number of procs to put on the node.

Refs trac:4003

This commit was SVN r30033.

The following Trac tickets were found above:
  Ticket 4003 --> https://svn.open-mpi.org/trac/ompi/ticket/4003
2013-12-20 20:42:39 +00:00
Ralph Castain
71b52fe861 Ensure that comm_spawn'd procs get user-specified forwarded envars
Thanks to Tim Miller for reporting the regression from the 1.6 series

cmr=v1.7.4:reviewer=jsquyres:subject=Ensure that comm_spawn'd procs get user-specified forwarded envars

This commit was SVN r30012.
2013-12-20 14:47:35 +00:00
Ralph Castain
d47d2569f3 We stripped the process info packing routine to minimize message size when sending the launch message, but tools still require all the info. So modify the tool-hnp handshake to explicitly add the missing info
Refs trac:3992

This commit was SVN r29989.

The following Trac tickets were found above:
  Ticket 3992 --> https://svn.open-mpi.org/trac/ompi/ticket/3992
2013-12-19 20:42:20 +00:00
Ralph Castain
55cd65b149 Don't warn about binding (process and/or memory) if the node cannot do it or if we would overload, but it wasn't specifically requested by the user (i.e., it is the result of the default policy). Instead, just don't bind and quietly move along.
Reset topology usage for each node as we bind as multiple nodes may be linked to the same topology object. This will need to be revisited for scale as it does take some non-zero time to reset the usage each iteration. However, storing individual topology objects for every node consumes memory, so it's a tradeoff.

cmr=v1.7.4:reviewer=jsquyres:subject=Eliminate excessive binding/memory warnings

This commit was SVN r29978.
2013-12-19 16:31:45 +00:00
Ralph Castain
9b32dacb6c Ensure we don't abort if a tool cannot send a message - the orte/util/comm library used by tools to query mpirun knows how to handle this situation.
Refs trac:3992

This commit was SVN r29975.

The following Trac tickets were found above:
  Ticket 3992 --> https://svn.open-mpi.org/trac/ompi/ticket/3992
2013-12-19 07:10:36 +00:00
Ralph Castain
6239e64f36 Further cleanup of orte-ps so it doesn't abort when hitting a stale HNP - only report that event once and just keep working.
Refs trac:3992

This commit was SVN r29974.

The following Trac tickets were found above:
  Ticket 3992 --> https://svn.open-mpi.org/trac/ompi/ticket/3992
2013-12-19 03:28:05 +00:00
Ralph Castain
bf5e314f76 Tools require their own errmgr and state components so they can handle any errors that occur in, for example, communication .
Refs trac:3992

This commit was SVN r29972.

The following Trac tickets were found above:
  Ticket 3992 --> https://svn.open-mpi.org/trac/ompi/ticket/3992
2013-12-19 01:49:33 +00:00
Ralph Castain
3aaca16faa Silence warnings that are no longer valid
Refs trac:3992

This commit was SVN r29970.

The following Trac tickets were found above:
  Ticket 3992 --> https://svn.open-mpi.org/trac/ompi/ticket/3992
2013-12-19 00:40:36 +00:00
Ralph Castain
c5956e7b8c Convert debug output to opal_output_verbose
Thanks to Tetsuya Mishima for reporting it

cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r29969.
2013-12-19 00:36:15 +00:00
Ralph Castain
39957df08e Fixes trac:3963. Fix the tool ess procedure so it opens and selects the OOB framework, and have the OOB TCP module update the route to new connections (the routed modules know what to do).
Thanks to Dave Love and Ashley Pittman for pointing out the problem.

cmr=v1.7.4:reviewer=jsquyres:subject=Fix tool communications with mpirun

This commit was SVN r29959.

The following Trac tickets were found above:
  Ticket 3963 --> https://svn.open-mpi.org/trac/ompi/ticket/3963
2013-12-18 23:13:46 +00:00
Ralph Castain
77553f72be Per this email thread:
http://www.open-mpi.org/community/lists/devel/2013/12/13412.php

fix the backtrace function to avoid async issues. Thanks to Takahiro Kawashima for the patch

This commit was SVN r29955.
2013-12-18 17:57:37 +00:00
Ralph Castain
ab4636c47b Per email on devel list, change the default rank-by to slot unless map-by <obj> is specified, in which case use rank-by <obj>
Refs trac:3977

This commit was SVN r29945.

The following Trac tickets were found above:
  Ticket 3977 --> https://svn.open-mpi.org/trac/ompi/ticket/3977
2013-12-18 00:48:50 +00:00
Ralph Castain
53cd00fe16 By setting a default mapping/ranking/binding policy that wasn't "none", we introduced a problem for users of the Mac and any other machine where sockets aren't defined and/or binding is not supported. Fix that by checking to see if the user specified the failing policy - if not, then fall back to the old map/rank by slot and no binding.
Refs trac:3977

This commit was SVN r29933.

The following Trac tickets were found above:
  Ticket 3977 --> https://svn.open-mpi.org/trac/ompi/ticket/3977
2013-12-17 14:50:10 +00:00
Adrian Reber
b42aad44a3 Trying to get the C/R code to compile again. This patch
includes various fixes all over the C/R code which are
hard to group like the other patches.

Changes from V1:
* explain why mca_base_component_distill_checkpoint_ready no longer works
* compare return result of opal functions with OPAL_* values

Changes from V2:
* use orte_rml_oob_ft_event() instead of referencing through the modules
* properly protect variable (thanks to --enable-picky)

This commit was SVN r29922.
2013-12-16 15:35:28 +00:00
Ralph Castain
8b6d117541 Per the OMPI devel conference that changed our default behaviors:
* default to bind-to core 
* map-by slot if np=2
* map-by socket (balance across sockets on each node) if np > 2
* map-by <obj> will imply rank-by <obj> by default (leave default binding as above) 

Fix a bug in the map-by <obj> mapper where we incorrectly compute the #procs to assign if the #slots > #procs

cmr=v1.7.4:reviewer=jsquyres:subject=Update default binding and mapping values

This commit was SVN r29919.
2013-12-15 17:25:54 +00:00
Jeff Squyres
770bf77149 Fix some minor memory leaks in error code paths.
Many thanks to Tom Fogal for the patch.

cmr=v1.7.4:reviewer=rhc:subject=Fix minor memory leaks in error code paths

This commit was SVN r29905.
2013-12-14 00:41:21 +00:00
Jeff Squyres
0ab48ad0d2 Fix some annoying flex warnings that have been there for years.
Many thanks to Tom Fogal for the initial patch.

cmr=v1.7.4:reviewer=rhc:subject=Fix annoying flex warnings

This commit was SVN r29904.
2013-12-14 00:36:12 +00:00
Jeff Squyres
2e7653e4c2 Add missing argv.h includes.
Noticed these as part of #3694: external libevent's don't cause argv.h
to automatically get included.

Refs trac:3694

This commit was SVN r29897.

The following Trac tickets were found above:
  Ticket 3694 --> https://svn.open-mpi.org/trac/ompi/ticket/3694
2013-12-13 21:17:36 +00:00
Brian Barrett
121ca26c59 Per discussion at Develoepr's Meeting, remove Solaris threads support. Solaris
will just fall back to pthreads, which should be no problem.

This commit was SVN r29893.
2013-12-13 20:07:11 +00:00
Ralph Castain
0e81959aae Cleanup mindist error messages - already patched in 1.7
This commit was SVN r29869.
2013-12-12 15:30:29 +00:00
Ralph Castain
1ff12362da Cleanup merge conflict that was incorrectly committed
This commit was SVN r29851.
2013-12-09 20:20:14 +00:00
Ralph Castain
83e59e6761 Once again, the Slurm folks have decided to redefine their envars, reversing what they had previously told us to do. So cleanup the Slurm allocation code, and also adjust to a change in srun behavior that now aborts a job if the ntasks-per-node doesn't get specified when ORTE calls it, but the user specified it when getting an allocation. Sigh.
cmr=v1.7.4:reviewer=miked:subject=Update Slurm allocation and launch

This commit was SVN r29849.
2013-12-09 17:58:46 +00:00
Mike Dubman
c208b858e7 improve error messages in mindist
cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r29846.
2013-12-09 06:34:38 +00:00
Ralph Castain
f2c49c6c19 Fix the map-by object mapper to handle cpus-per-proc by accounting for the request when computing the number of procs to put on each object. This ensures that the binding routine doesn't automatically overload the cores.
cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r29843.
2013-12-08 16:59:25 +00:00
Ralph Castain
9604f36c3b Specify units for the job completion timeout
This commit was SVN r29839.
2013-12-08 04:51:58 +00:00
Ralph Castain
62c9e5c64c Really is better if we output a message indicating that the job was aborted due to hitting the execution time limit
Refs trac:3960

This commit was SVN r29833.

The following Trac tickets were found above:
  Ticket 3960 --> https://svn.open-mpi.org/trac/ompi/ticket/3960
2013-12-07 15:33:56 +00:00
Ralph Castain
d44e4a311f Per request from Dave Goodell, add support for MPIEXEC_TIMEOUT - if set in the environment, terminate the job after the specified number of seconds has passed. Equivalent to MPICH functionality.
cmr=v1.7.4:reviewer=dgoodell:subject=add support for MPIEXEC_TIMEOUT

This commit was SVN r29831.
2013-12-07 01:58:32 +00:00
Jeff Squyres
ed9aba3896 This patch fixes
error: void value not ignored as it ought to be 

in the C/R code by ignoring the return value of functions which 
no longer return a value (only void). 

Signed-off-by: Adrian Reber <adrian.reber@hs-esslingen.de>

This commit was SVN r29816.
2013-12-06 14:40:10 +00:00
Ralph Castain
fb59b6b875 Silence compiler warning when --disable-orte-static-ports
This commit was SVN r29783.
2013-12-03 01:53:31 +00:00
Ralph Castain
617a0edbb8 Fix hostfile parsing for the case where RMs count slots by listing the node multiple times. Thanks to Tetsuya Mishima for rep[orting the problem and providing a patch.
cmf=v1.7.4:reviewer=rhc

This commit was SVN r29748.
2013-11-24 16:17:52 +00:00
Ralph Castain
7c23a5ad65 Fix headers when building with ft enabled. Thanks to Adrian Reber for the patch!
This commit was SVN r29743.
2013-11-23 22:58:32 +00:00
Ralph Castain
7480beb7f0 Per request from Nathan, add an offset value to the job struct so we can construct a "global rank" that spans multiple jobs during dynamic launch operations. Store a new ORTE_DB_GLOBAL_RANK value for each process in the database, and ensure that we share our own value during connect_accept so both sides can see it.
This isn't being used yet - just enabling Nathan to do what he needs.

***** NOTE: any use of the OMPI_DB_GLOBAL_RANK database key must be protected by #ifdef OMPI_DB_GLOBAL_RANK as not all RTE's will define this key. *****

This commit was SVN r29708.
2013-11-14 17:01:43 +00:00
Ralph Castain
0f420f3676 Add a little debug
cmr:v1.7.4:reviewer=ompi-gk1.7

This commit was SVN r29705.
2013-11-14 04:22:59 +00:00
Ralph Castain
561c1830f7 Cleanup the radix MCA params - the max connections has no relationship to the max fd's on a node. It solely is used to improve the performance of small jobs by avoiding unnecessary inter-daemon routing.
Refs trac:3917

This commit was SVN r29704.

The following Trac tickets were found above:
  Ticket 3917 --> https://svn.open-mpi.org/trac/ompi/ticket/3917
2013-11-14 04:19:06 +00:00
Ralph Castain
540d38bc12 Per patch from Jeff, treat the case where someone transfers an archived file of an unknown type. This can't actually happen as this is on the receive end, and the error would have been treated and rejected on the send side. Still, practice defensive programming.
cmr:v1.7.4:reviewer=rhc:subject=Practice defensive programming

This commit was SVN r29699.
2013-11-13 23:49:26 +00:00
Jeff Squyres
f4e647538c mca_routed_radix_component.max_connections is unsigned; it can never be <0
cmr=v1.7.4:reviewer=hjelmn

This commit was SVN r29694.
2013-11-13 19:37:24 +00:00
Jeff Squyres
038116e4b8 orte_iof_base.output_limit is an unsigned type; just initialize it to INT_MAX
cmr=v1.7.4:reviewer=rhc

This commit was SVN r29693.
2013-11-13 19:36:43 +00:00
Mike Dubman
840e2cb4a2 mindist: cosmetic, use fallback to byslot if unable to read NUMA info, small fix.
fixed by Elena, reviewed by Ralph/Mike
cmr=v1.7.4:reviewer=ompi-gk1.7

This commit was SVN r29679.
2013-11-13 09:26:40 +00:00
Ralph Castain
5b38259264 Ouch - remove an extraneous line.
Thanks to Tetsuya Mishima for reporting it

cmr=v1.7.4:reviewer=rhc:subject=Remove extraneous line from OOB

This commit was SVN r29677.
2013-11-13 04:02:05 +00:00
Ralph Castain
f1e510154c Revise the launch timeout detection so we don't mistakenly declare "failed to start". Recognize that timeout is at the per-job level, and define the timeout param as a total value instead of seconds/daemon as it otherwise can get to be an enormous (and useless) number.
Resolves problems in loop_spawn where the timer was incorrectly firing and killing the overall job.

cmr=v1.7.4:reviewer=hjelmn

This commit was SVN r29661.
2013-11-11 23:50:40 +00:00
Ralph Castain
46f633883b Correct the error check on rml.send
cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r29660.
2013-11-11 23:23:12 +00:00
Ralph Castain
e35ad23176 Correctly compute usage for dynamic spawns when binding is invoked. Ensure we correctly account for existing process usage on each node when computing bindings during dynamic spawns.
cmr=v1.7.4:reviewer=hjelmn:subject=Correctly compute usage for dynamic spawns when binding is invoked

This commit was SVN r29649.
2013-11-10 00:38:01 +00:00
Joshua Ladd
d594ffbfc7 Backing out Elena's patch - abstraction violation
This commit was SVN r29645.
2013-11-08 13:12:07 +00:00
Joshua Ladd
da3e272fdd Adds a check in the mindist mapper for whether or not the user asks for a specific device. This patch was submited by Elena Elkina and reviewed by Josh Ladd and should be added to
cmr=v1.7.4:reviewer=jladd

This commit was SVN r29644.
2013-11-08 04:28:53 +00:00
Brian Barrett
6d7a1fbb82 Move opal_portable_platform.h to opal/include/opal, which is where it really
should have been all along and fix one place that uses the file

Update opal_portable_platform.h with changes to mpi_portable_platform.h made 
in r29608.

Make mpi_portable_platform.h a symlink to opal_portable_platform.h, so that
they won't get out of sync.  I'd like to remove mpi_portable_platform.h, but
we don't automatically add -I${includedir}/openmpi/ to make that sane from
a header include point of view, so that's future work.

This commit was SVN r29618.

The following SVN revision numbers were found above:
  r29608 --> open-mpi/ompi@b71bd51cdd
2013-11-06 17:12:26 +00:00
Ralph Castain
fb0940a9d9 Add a couple of useful tests
This commit was SVN r29539.
2013-10-28 13:24:16 +00:00
Ralph Castain
8c5c7d0db4 Correct a bug in handling of oob_tcp_if_include/exclude addresses by using the kernel index instead of the raw index of the interface.
Refs trac:3696

This commit was SVN r29522.

The following Trac tickets were found above:
  Ticket 3696 --> https://svn.open-mpi.org/trac/ompi/ticket/3696
2013-10-26 00:47:14 +00:00
Ralph Castain
604970a1a2 Initialize orte_coprocessors hash table to NULL. Delay coprocessor detection on HNP until after node topology final definition in case rmaps changes it. Minor spacing change.
Refs trac:3847

This commit was SVN r29504.

The following Trac tickets were found above:
  Ticket 3847 --> https://svn.open-mpi.org/trac/ompi/ticket/3847
2013-10-24 00:08:47 +00:00
Ralph Castain
f5920e9312 Revert r29489. This function only executes in the HNP. In orte/mca/ess/hnp/ess_hnp_module.c, we already check for local coprocessors and add them to the hash table if found. Thus, r29489 simply overwrote what was already present.
The data for each remote daemon is added later in the daemon callback function. Only the HNP retains info in the hash table.

If it is desirable to have each daemon retain its own coprocessor info, then this must be done in orte/mca/ess/base/ess_base_std_orted.c.

This commit was SVN r29497.

The following SVN revision numbers were found above:
  r29489 --> open-mpi/ompi@2e2794fa15
2013-10-23 22:35:24 +00:00
Nathan Hjelm
2e2794fa15 Fix coprocessor detection by always adding the local daemon's co-processors
to the hash table.

Tested and working on a system with 2 Xeon Phi co-processors.

cmr=v1.7.4:ticket=3847:reviewer=ompi-rm1.7

This commit was SVN r29489.

The following Trac tickets were found above:
  Ticket 3847 --> https://svn.open-mpi.org/trac/ompi/ticket/3847
2013-10-23 15:56:23 +00:00
Ralph Castain
7c86a843c8 Silence compiler warning
This commit was SVN r29477.
2013-10-23 04:13:36 +00:00
Ralph Castain
960a255e7f Do some cleanup of the --without-hwloc build - no need to work on coprocessors since we can't detect them anyway, cleanup some unused variables in the ppr mapper
This commit was SVN r29476.
2013-10-23 01:45:21 +00:00
Ralph Castain
25a84c7f0a Fix build --without-hwloc
This commit was SVN r29453.
2013-10-19 23:12:33 +00:00
Ralph Castain
b12167abef Per a good suggestion from Jeff, make the coprocessor mapping more scalable by using a hash table to cache the coprocessor list, and then do a single pass thru the nodes at the end to assign hostid's.
Refs trac:3847

This commit was SVN r29439.

The following Trac tickets were found above:
  Ticket 3847 --> https://svn.open-mpi.org/trac/ompi/ticket/3847
2013-10-14 22:01:48 +00:00
Ralph Castain
24c811805f ****************************************************************
This change contains a non-mandatory modification
       of the MPI-RTE interface. Anyone wishing to support
       coprocessors such as the Xeon Phi may wish to add
       the required definition and underlying support
****************************************************************

Add locality support for coprocessors such as the Intel Xeon Phi.

Detecting that we are on a coprocessor inside of a host node isn't straightforward. There are no good "hooks" provided for programmatically detecting that "we are on a coprocessor running its own OS", and the ORTE daemon just thinks it is on another node. However, in order to properly use the Phi's public interface for MPI transport, it is necessary that the daemon detect that it is colocated with procs on the host.

So we have to split the locality to separately record "on the same host" vs "on the same board". We already have the board-level locality flag, but not quite enough flexibility to handle this use-case. Thus, do the following:

1. add OPAL_PROC_ON_HOST flag to indicate we share a host, but not necessarily the same board

2. modify OPAL_PROC_ON_NODE to indicate we share both a host AND the same board. Note that we have to modify the OPAL_PROC_ON_LOCAL_NODE macro to explicitly check both conditions

3. add support in opal/mca/hwloc/base/hwloc_base_util.c for the host to check for coprocessors, and for daemons to check to see if they are on a coprocessor. The former is done via hwloc, but support for the latter is not yet provided by hwloc. So the code for detecting we are on a coprocessor currently is Xeon Phi specific - hopefully, we will find more generic methods in the future.

4. modify the orted and the hnp startup so they check for coprocessors and to see if they are on a coprocessor, and have the orteds pass that info back in their callback message. Automatically detect that coprocessors have been found and identify which coprocessors are on which hosts. Note that this algo isn't scalable at the moment - this will hopefully be improved over time.

5. modify the ompi proc locality detection function to look for coprocessor host info IF the OMPI_RTE_HOST_ID database key has been defined. RTE's that choose not to provide this support do not have to do anything - the associated code will simply be ignored.

6. include some cleanup of the hwloc open/close code so it conforms to how we did things in other frameworks (e.g., having a single "frame" file instead of open/close). Also, fix the locality flags - e.g., being on the same node means you must also be on the same cluster/cu, so ensure those flags are also set.

cmr:v1.7.4:reviewer=hjelmn

This commit was SVN r29435.
2013-10-14 16:52:58 +00:00
Ralph Castain
9902748108 ***** THIS INCLUDES A SMALL CHANGE IN THE MPI-RTE INTERFACE *****
Fix two problems that surfaced when using direct launch under SLURM:

1. locally store our own data because some BTLs want to retrieve 
   it during add_procs rather than use what they have internally

2. cleanup MPI_Abort so it correctly passes the error status all
   the way down to the actual exit. When someone implemented the
   "abort_peers" API, they left out the error status. So we lost
   it at that point and *always* exited with a status of 1. This 
   forces a change to the API to include the status.

cmr:v1.7.3:reviewer=jsquyres:subject=Fix MPI_Abort and modex_recv for direct launch

This commit was SVN r29405.
2013-10-08 18:37:59 +00:00
Ralph Castain
9389592e05 Fix --without-hwloc build
This commit was SVN r29399.
2013-10-08 15:02:47 +00:00
Ralph Castain
2bd2284b93 Add a useful test and update another
This commit was SVN r29370.
2013-10-04 15:21:40 +00:00
Ralph Castain
697fb253fa Minor modification to test code
This commit was SVN r29357.
2013-10-04 03:11:31 +00:00
Ralph Castain
f4f2287958 Singletons currently start out by spawning an HNP - this is required solely in the cases where the singleton subsequently calls MPI_Comm_spawn or publishes port info without support from an external orte-server. In all other cases, the HNP is of no value and can actually be a detriment by creating additional overhead on the node. This is particularly concerning for async operations where processes may begin as singletons and then dynamically wireup to perform pt2pt communications.
So we now allow singletons to start on their own, only spawning an HNP when initiating an operation that actually requires it.

cmr:v1.7.4:reviewer=jsquyres

This commit was SVN r29354.
2013-10-04 02:58:26 +00:00
Nathan Hjelm
11722457ce Fix typo in grpcomm_pmi_module.c that was giving the wrong locality for direct launched jobs. Refs trac:3824
This commit was SVN r29348.

The following Trac tickets were found above:
  Ticket 3824 --> https://svn.open-mpi.org/trac/ompi/ticket/3824
2013-10-03 14:38:45 +00:00
Ralph Castain
2121e9c01b Fix an issue regarding use of PMI when running processes and tools that don't need or want to use it. We build PMI support based on configuration settings and library availability.
However, tools such as mpirun don't need it, and definitely shouldn't be using it. Ditto for procs launched by mpirun.

We used to have a way of dealing with this - we had the PMI component check to see if the process was the HNP or was launched by an HNP. Sadly, moving the OPAL db framework removed
 that ability as OPAL has no notion of HNPs or proc type.

So add a boolean flag to the db_base_select API that allows us to restrict selection to "local" components. This gives the PMI component the ability to reject itself as required. W
e then need to pass that param into the ess_base_std_app call so it can pass it all down.

This commit was SVN r29341.
2013-10-02 19:03:46 +00:00
Ralph Castain
5ec422dbc1 Correctly compute num local peers when launched via mpirun
This commit was SVN r29327.
2013-10-02 01:46:09 +00:00
Ralph Castain
71a24d6e74 Add some debug
This commit was SVN r29326.
2013-10-02 01:37:02 +00:00
Ralph Castain
c6b7d9d027 Fix variable declaration
This commit was SVN r29324.
2013-10-02 01:08:51 +00:00
Ralph Castain
fcb381c2e2 Minor cleanup - should behave the same, but just cleanup the variable names to avoid confusion
This commit was SVN r29323.
2013-10-02 00:10:36 +00:00
Ralph Castain
d565a76814 Do some cleanup of the way we handle modex data. Identify data that needs to be shared with peers in my job vs data that needs to be shared with non-peers - no point in sharing extra data. When we share data with some process(es) from another job, we cannot know in advance what info they have or lack, so we have to share everything just in case. This limits the optimization we can do for things like comm_spawn.
Create a new required key in the OMPI layer for retrieving a "node id" from the database. ALL RTE'S MUST DEFINE THIS KEY. This allows us to compute locality in the MPI layer, which is necessary when we do things like intercomm_create.

cmr:v1.7.4:reviewer=rhc:subject=Cleanup handling of modex data

This commit was SVN r29274.
2013-09-27 00:37:49 +00:00
Ralph Castain
6522963b9c Flag that a daemon has been launched when it reports back to the HNP so we avoid re-launching it on spawns against dynamic allocations
cmr:v1.7.3:reviewer=jsquyres

This commit was SVN r29245.
2013-09-25 16:58:19 +00:00
Ralph Castain
23c8848157 Only connect the first time thru the Torque launch, remove stale code
cmr:v1.7.3:reviewer=jsquyres

This commit was SVN r29227.
2013-09-22 23:53:57 +00:00
Ralph Castain
400c68ed0f Fix a segfault when a topology file is given to use in place of the one detected by mpirun itself. In that situation, the rmaps framework replaces the opal_hwloc_topology structure - but since that occurs *after* mpirun has set the node->topology field, we lose that definition. So don't set the node->topology field until after the rmaps framework has been opened.
Does not need to go to 1.7 branch as that ordering is different.

-This line, and those below, will be ignored--

M    orte/mca/ess/hnp/ess_hnp_module.c

This commit was SVN r29225.
2013-09-21 19:47:41 +00:00
Jeff Squyres
758cd25fff Move the MCA / MPI_T level of the LAMA component down to 5 (from 9).
This commit was SVN r29214.
2013-09-20 15:23:27 +00:00
George Bosilca
273d66d0f2 The MPI_Intercomm_create test was broken, as the remote peer was
always considered as being 1 (instead of count).

This commit was SVN r29207.
2013-09-18 16:47:54 +00:00
Ralph Castain
865a7028f8 Per patch from George, with a few minor cleanups. Correctly address the complete exchange of required wireup information in Intercomm_create so all procs in the resulting communicator know how to talk to each other.
Refs trac:29166

This commit was SVN r29200.

The following Trac tickets were found above:
  Ticket 29166 --> https://svn.open-mpi.org/trac/ompi/ticket/29166
2013-09-18 02:01:30 +00:00
Ralph Castain
99611ac1d2 Revert r29166 in favor of a better solution from George
This commit was SVN r29199.

The following SVN revision numbers were found above:
  r29166 --> open-mpi/ompi@497c7e6abb
2013-09-18 01:41:26 +00:00
George Bosilca
9e6c3c0646 Save the error code.
This commit was SVN r29196.
2013-09-17 23:50:11 +00:00
Ralph Castain
2680bff88e The function orte_iof_base_setup_prefork attempts to create a pty for
child stdout and falls back to plain pipe if openpty fails. Child uses
the 'usepty' flag to decide whether to treat this descriptor as a pty
or as a pipe.
Set 'usepty' flag to 0 upon openpty failure to inform the child that
it isn't dealing with a pty even though pty has been requested.


Thanks to Michal Peclo for reporting it and providing a patch.

cmr:v1.7.3:reviewer=jsquyres
cmr:v1.6.6:reviewer=jsquyres

This commit was SVN r29169.
2013-09-15 15:33:51 +00:00
Ralph Castain
b64c8dafd8 Cleanup some errors in pubsub - must set the active flag before posting the recv in case the message has already arrived
Refs trac:3696

This commit was SVN r29167.

The following Trac tickets were found above:
  Ticket 3696 --> https://svn.open-mpi.org/trac/ompi/ticket/3696
2013-09-15 15:26:32 +00:00
Ralph Castain
497c7e6abb Fixes trac:2904
The intercomm "merge" function can create a linkage between procs that was not reflected anywhere in a modex, and so at least some of the procs in the resulting communicator don't know how to talk to some of the new communicator's peers.

For example, consider the case where:

1. parent job A comm_spawns a process (job B) - these processes exchange modex and can communicate

2. parent job A now comm_spawns another process (job C) - again, these can communicate, but the proc in C knows nothing of B

3. do an intercomm merge across the communicators created by the two comm_spawns. This puts B and C into the same communicator, but they know nothing about how to talk to each other as they were not involved in any exchange of contact info. Hence, collectives on that communicator now fail. 

This fix adds an API to the ompi/dpm framework that (a) exchanges the modex info across the procs in the merge to ensure all procs know how to communicate, and (b) calls add_procs to give the btl's a chance to select transports to any new procs.

cmr:v1.7.3:reviewer=jsquyres

This commit was SVN r29166.

The following Trac tickets were found above:
  Ticket 2904 --> https://svn.open-mpi.org/trac/ompi/ticket/2904
2013-09-15 15:00:40 +00:00
Ralph Castain
eb132f923b Check for bozo error of negative np for an app as this will cause ORTE to spin forever.
cmr:v1.7.3:reviewer=jsquyres:subject=Check for negative np
cmr:v1.6.6:reviewer=jsquyres:subject=Check for negative np

This commit was SVN r29157.
2013-09-11 19:21:22 +00:00
Ralph Castain
2a116ecdfc Fix a race condition created when two processes attempt to send to each other at the same time. This causes both processes to start connection procedures, resulting in a c
onflict that can cause messages to be lost. Add detection of this condition, and have both processes cancel their connect operations. The process with the higher rank will
 reconnect, while the lower rank process will simply wait for the connection to be created.

Refs trac:3696

This commit was SVN r29139.

The following Trac tickets were found above:
  Ticket 3696 --> https://svn.open-mpi.org/trac/ompi/ticket/3696
2013-09-06 05:15:25 +00:00
Ralph Castain
e8697de521 Deal with PGI compilers on the Mac by initializing a global variable.
cmr:v1.6.6:reviewer=jsquyres
cmr:v1.7.3:reviewer=jsquyres

This commit was SVN r29129.
2013-09-05 21:40:50 +00:00
Ralph Castain
13ae51a91b Protect against possible race conditions and threads by ensuring that rml send always occurs inside an event.
cmr:v1.7.4:reviewer=jsquyres:subject=Protect against race conditions in rml send

This commit was SVN r29128.
2013-09-05 01:16:32 +00:00
Ralph Castain
d32dfc96be Use the rankfile to obtain list of nodes for VM launch if/when rankfile is given.
cmr:v1.7.3:reviewer=jsquyres:subject=Obtain VM nodes from rankfile

This commit was SVN r29119.
2013-09-04 16:37:30 +00:00
Ralph Castain
d9f0505952 Fix the lama verbose outputs so they don't segfault if someone asks for verbose output, but isn't using lama
cmr:v1.7.3:reviewer=jsquyres

This commit was SVN r29108.
2013-09-03 17:55:35 +00:00
Ralph Castain
2bfa99e945 If a rankfile is given and the number of procs not specified in the mpirun cmd line, then set the number of procs to the number of ranks in the rankfile
cmr:v1.7.3:reviewer=jsquyres

This commit was SVN r29104.
2013-09-02 15:04:40 +00:00
Ralph Castain
43d1cd92ac Ensure we activate the "daemons launched" state when only the HNP is left or else we will hang.
cmr:v1.7.3:reviewer=jsquyres

This commit was SVN r29094.
2013-08-29 22:50:51 +00:00
Dave Goodell
d17f104e7a oob: squash some valgrind warnings
These warnings were harmless, but they appeared even for simple programs
like single-process runs of `ring_c`.

This commit was SVN r29093.
2013-08-29 21:08:44 +00:00
Ralph Castain
12d4f45b5e Silence warning:
oob_tcp_connection.c: In function 'mca_oob_tcp_peer_accept':
oob_tcp_connection.c:725:9: warning: variable 'cmpval' set but not used [-Wunused-but-set-variable]

Refs trac:3696

This commit was SVN r29091.

The following Trac tickets were found above:
  Ticket 3696 --> https://svn.open-mpi.org/trac/ompi/ticket/3696
2013-08-29 20:56:05 +00:00
Ralph Castain
7a7cfdd519 A little cleanup - the base function to sort numa lists must return something or you get a warning about non-void function returning without value, so cleanup the return values. Ensure the mindist module actually checks for a return of "error" so it won't segfault, and have it emit a polite message when that happens.
cmr:v1.7.3:reviewer=jladd

This commit was SVN r29089.
2013-08-29 20:01:06 +00:00
Ralph Castain
c71e760e6c The modex code was unfortunately written solely for PMI1 when updated to minimize calls to PMI_get - add the required PMI2 code
This commit was SVN r29084.
2013-08-28 23:52:32 +00:00
Joshua Ladd
1802aabf1a Add support for autodetecting a MLNX HCA in the rmaps min distance feature. In this way, .ini files distributed with software stacks need not specify a particular HCA but instead may select the key word auto which will automatically select the discovered device. To use this feature, simply pass the keyword auto instead of a specific device name, --mca rmaps_base_dist_hca auto. If more than one card is installed, the mapper will inform the user of this and, at this point, the user will then need to specify which card via the normal route, e.g. --mca rmaps_base_dist_hca <dev_name>. This should be added to \ncmr=v1.7.4:reviewer=rhc:subject=Autodetect logic for min dist mapping
This commit was SVN r29079.
2013-08-28 16:23:33 +00:00
Ralph Castain
7125143253 Replace missing opal_db open/select that was apparently lost on a prior merge. Thanks to Nathan for pointing it out
This commit was SVN r29072.
2013-08-27 19:42:31 +00:00
George Bosilca
65a362909d Can't see how it works ...
Thanks Thomas and Arm for the patch.

This commit was SVN r29066.
2013-08-27 16:52:24 +00:00
Ralph Castain
c9a25465da Don't need the number of nodes any more for PMI
Refs trac:3729

This commit was SVN r29064.

The following Trac tickets were found above:
  Ticket 3729 --> https://svn.open-mpi.org/trac/ompi/ticket/3729
2013-08-23 18:36:51 +00:00
Ralph Castain
6d24b34940 Extend the dpm framework API to support persistent accept/connect operations:
* paccept - establish a persistent listening port for async connect requests

* pconnect - async connect to remote process that has posted a paccept port. Provides a timeout mechanism, and allows the underlying implementation to retry until timeout 

* pclose - shuts down a prior paccept posting

Includes example programs paccept.c and pconnect.c in orte/test/mpi. New MPI extension interfaces coming...

This commit was SVN r29063.
2013-08-23 18:02:50 +00:00
Ralph Castain
a200e4f865 As per the RFC, bring in the ORTE async progress code and the rewrite of OOB:
*** THIS RFC INCLUDES A MINOR CHANGE TO THE MPI-RTE INTERFACE ***

Note: during the course of this work, it was necessary to completely separate the MPI and RTE progress engines. There were multiple places in the MPI layer where ORTE_WAIT_FOR_COMPLETION was being used. A new OMPI_WAIT_FOR_COMPLETION macro was created (defined in ompi/mca/rte/rte.h) that simply cycles across opal_progress until the provided flag becomes false. Places where the MPI layer blocked waiting for RTE to complete an event have been modified to use this macro.

***************************************************************************************

I am reissuing this RFC because of the time that has passed since its original release. Since its initial release and review, I have debugged it further to ensure it fully supports tests like loop_spawn. It therefore seems ready for merge back to the trunk. Given its prior review, I have set the timeout for one week.

The code is in  https://bitbucket.org/rhc/ompi-oob2


WHAT:    Rewrite of ORTE OOB

WHY:       Support asynchronous progress and a host of other features

WHEN:    Wed, August 21

SYNOPSIS:
The current OOB has served us well, but a number of limitations have been identified over the years. Specifically:

* it is only progressed when called via opal_progress, which can lead to hangs or recursive calls into libevent (which is not supported by that code)

* we've had issues when multiple NICs are available as the code doesn't "shift" messages between transports - thus, all nodes had to be available via the same TCP interface.

* the OOB "unloads" incoming opal_buffer_t objects during the transmission, thus preventing use of OBJ_RETAIN in the code when repeatedly sending the same message to multiple recipients

* there is no failover mechanism across NICs - if the selected NIC (or its attached switch) fails, we are forced to abort

* only one transport (i.e., component) can be "active"


The revised OOB resolves these problems:

* async progress is used for all application processes, with the progress thread blocking in the event library

* each available TCP NIC is supported by its own TCP module. The ability to asynchronously progress each module independently is provided, but not enabled by default (a runtime MCA parameter turns it "on")

* multi-address TCP NICs (e.g., a NIC with both an IPv4 and IPv6 address, or with virtual interfaces) are supported - reachability is determined by comparing the contact info for a peer against all addresses within the range covered by the address/mask pairs for the NIC.

* a message that arrives on one TCP NIC is automatically shifted to whatever NIC that is connected to the next "hop" if that peer cannot be reached by the incoming NIC. If no TCP module will reach the peer, then the OOB attempts to send the message via all other available components - if none can reach the peer, then an "error" is reported back to the RML, which then calls the errmgr for instructions.

* opal_buffer_t now conforms to standard object rules re OBJ_RETAIN as we no longer "unload" the incoming object

* NIC failure is reported to the TCP component, which then tries to resend the message across any other available TCP NIC. If that doesn't work, then the message is given back to the OOB base to try using other components. If all that fails, then the error is reported to the RML, which reports to the errmgr for instructions

* obviously from the above, multiple OOB components (e.g., TCP and UD) can be active in parallel

* the matching code has been moved to the RML (and out of the OOB/TCP component) so it is independent of transport

* routing is done by the individual OOB modules (as opposed to the RML). Thus, both routed and non-routed transports can simultaneously be active

* all blocking send/recv APIs have been removed. Everything operates asynchronously.


KNOWN LIMITATIONS:

* although provision is made for component failover as described above, the code for doing so has not been fully implemented yet. At the moment, if all connections for a given peer fail, the errmgr is notified of a "lost connection", which by default results in termination of the job if it was a lifeline

* the IPv6 code is present and compiles, but is not complete. Since the current IPv6 support in the OOB doesn't work anyway, I don't consider this a blocker

* routing is performed at the individual module level, yet the active routed component is selected on a global basis. We probably should update that to reflect that different transports may need/choose to route in different ways

* obviously, not every error path has been tested nor necessarily covered

* determining abnormal termination is more challenging than in the old code as we now potentially have multiple ways of connecting to a process. Ideally, we would declare "connection failed" when *all* transports can no longer reach the process, but that requires some additional (possibly complex) code. For now, the code replicates the old behavior only somewhat modified - i.e., if a module sees its connection fail, it checks to see if it is a lifeline. If so, it notifies the errmgr that the lifeline is lost - otherwise, it notifies the errmgr that a non-lifeline connection was lost.

* reachability is determined solely on the basis of a shared subnet address/mask - more sophisticated algorithms (e.g., the one used in the tcp btl) are required to handle routing via gateways

* the RML needs to assign sequence numbers to each message on a per-peer basis. The receiving RML will then deliver messages in order, thus preventing out-of-order messaging in the case where messages travel across different transports or a message needs to be redirected/resent due to failure of a NIC

This commit was SVN r29058.
2013-08-22 16:37:40 +00:00
Ralph Castain
63d10d2d0d Fix typo
Refs trac:3729

This commit was SVN r29057.

The following Trac tickets were found above:
  Ticket 3729 --> https://svn.open-mpi.org/trac/ompi/ticket/3729
2013-08-22 16:05:58 +00:00
Ralph Castain
16c5b30a1f Since the calls to "PMI get" scale by number of procs (not nodes), it makes more sense to have the MCA param be the cutoff based on number of procs. Also, it occurred to me that this shouldn't impact the nidmap process as that is built and circulated when we launch via mpirun, not during direct launch.
So shift the cutoff param to the MPI layer, and have it solely determine whether or not we call modex_recv on the hostname. If comm_world is of size greater than the cutoff, then we don't automatically retrieve the hostname when we build the ompi_proc_t for a process - instead, we fill the hostname entry on first call to modex_recv for that process.

The param is now "ompi_hostname_cutoff=N", where N=number of procs for cutoff.

Refs trac:3729

This commit was SVN r29056.

The following Trac tickets were found above:
  Ticket 3729 --> https://svn.open-mpi.org/trac/ompi/ticket/3729
2013-08-22 03:40:26 +00:00
Ralph Castain
45e695928f As per the email discussion, revise the sparse handling of hostnames so that we avoid potential infinite loops while allowing large-scale users to improve their startup time:
* add a new MCA param orte_hostname_cutoff to specify the number of nodes at which we stop including hostnames. This defaults to INT_MAX => always include hostnames. If a value is given, then we will include hostnames for any allocation smaller than the given limit.

* remove ompi_proc_get_hostname. Replace all occurrences with a direct link to ompi_proc_t's proc_hostname, protected by appropriate "if NULL"

* modify the OMPI-ORTE integration component so that any call to modex_recv automatically loads the ompi_proc_t->proc_hostname field as well as returning the requested info. Thus, any process whose modex info you retrieve will automatically receive the hostname. Note that on-demand retrieval is still enabled - i.e., if we are running under direct launch with PMI, the hostname will be fetched upon first call to modex_recv, and then the ompi_proc_t->proc_hostname field will be loaded

* removed a stale MCA param "mpi_keep_peer_hostnames" that was no longer used anywhere in the code base

* added an envar lookup in ess/pmi for the number of nodes in the allocation. Sadly, PMI itself doesn't provide that info, so we have to get it a different way. Currently, we support PBS-based systems and SLURM - for any other, rank0 will emit a warning and we assume max number of daemons so we will always retain hostnames

This commit was SVN r29052.
2013-08-20 18:59:36 +00:00
Ralph Castain
9aebd7e281 Ensure we register the nidmap verbosity in mpirun, and add some debug
This commit was SVN r29042.
2013-08-18 23:40:32 +00:00
Ralph Castain
611d7f9f6b When we direct launch an application, we rely on PMI for wireup support. In doing so, we lose the de facto data compression we get from the ORTE modex since we no longer get all the wireup info from every proc in a single blob. Instead, we have to iterate over all the procs, calling PMI_KVS_get for every value we require.
This creates a really bad scaling behavior. Users have found a nearly 20% launch time differential between mpirun and PMI, with PMI being the slower method. Some of the problem is attributable to poor exchange algorithms in RM's like Slurm and Alps, but we make things worse by calling "get" so many times.

Nathan (with a tad advice from me) has attempted to alleviate this problem by reducing the number of "get" calls. This required the following changes:

* upon first request for data, have the OPAL db pmi component fetch and decode *all* the info from a given remote proc. It turned out we weren't caching the info, so we would continually request it and only decode the piece we needed for the immediate request. We now decode all the info and push it into the db hash component for local storage - and then all subsequent retrievals are fulfilled locally

* reduced the amount of data by eliminating the exchange of the OMPI_ARCH value if heterogeneity is not enabled. This was used solely as a check so we would error out if the system wasn't actually homogeneous, which was fine when we thought there was no cost in doing the check. Unfortunately, at large scale and with direct launch, there is a non-zero cost of making this test. We are open to finding a compromise (perhaps turning the test off if requested?), if people feel strongly about performing the test

* reduced the amount of RTE data being automatically fetched, and fetched the rest only upon request. In particular, we no longer immediately fetch the hostname (which is only used for error reporting), but instead get it when needed. Likewise for the RML uri as that info is only required for some (not all) environments. In addition, we no longer fetch the locality unless required, relying instead on the PMI clique info to tell us who is on our local node (if additional info is required, the fetch is performed when a modex_recv is issued).

Again, all this only impacts direct launch - all the info is provided when launched via mpirun as there is no added cost to getting it

Barring objections, we may move this (plus any required other pieces) to the 1.7 branch once it soaks for an appropriate time.

This commit was SVN r29040.
2013-08-17 00:49:18 +00:00
Ralph Castain
b2d86e1857 Silence uninitialized var warning
This commit was SVN r29034.
2013-08-16 21:35:51 +00:00
Ralph Castain
b34bff8792 Cleanup warning
This commit was SVN r29032.
2013-08-16 21:14:35 +00:00
Ralph Castain
bebe852057 Add new info key for publish that allows user to designate that the port is to be unique - i.e., to return an error if that service has already been published. Default is to overwrite
This commit was SVN r29028.
2013-08-14 04:21:17 +00:00
Ralph Castain
72b5e867ab Correct shutdown ordering - rml must go last
This commit was SVN r29027.
2013-08-14 04:20:17 +00:00
Ralph Castain
8a4c5f4957 Attempt to plug a few memory leaks by ensuring we finalize all things opened during init. However, we are still leaking memory like a sieve in param registration and hwloc.
This commit was SVN r29026.
2013-08-14 02:03:00 +00:00
Nathan Hjelm
b2e773ece3 Fix debugger support for direct-launched jobs.
The orte rte component checks the orte_standalone_operation to decide
if it should wait for a message from the hnp or wait on the debugger.
This variable needed to be set to true in ess/pmi to enable the
correct path when direct launching.

cmr=v1.7.3:reviewer=rhc
cmr=v1.6.6:reviewer=rhc

This commit was SVN r29013.
2013-08-09 22:39:41 +00:00
Nathan Hjelm
841ed962f6 fix MCA variable and component system leaks
cmr=v1.7.3:reviewer=rhc

This commit was SVN r29011.
2013-08-09 19:50:28 +00:00
Nathan Hjelm
88cadc552d Make opal/db/pmi use as few PMI keys as possible.
This commit reintroduces key compression into the pmi db. This feature
compresses the keys stored into the component into a small number of
PMI keys by serializing the data and base64 encoding the result. This
will avoid issues with Cray PMI which restricts us to ~ 3 PMI keys per
rank.

This commit was SVN r28993.
2013-08-03 01:06:59 +00:00
Ralph Castain
285429a1c6 Remove release of buffer - non-blocking send callback will do it
This commit was SVN r28985.
2013-08-02 03:49:17 +00:00
Ralph Castain
37db1727a2 Refs trac:3710
Simplify the whole stripping of prefix method by consolidating it into a single MCA param. Allow for multiple prefixes to be stripped, each separated in the param by a comma. If no prefix is given, or the specified prefix isn't in the nodename, then just use the hostname itself.

This commit was SVN r28974.

The following Trac tickets were found above:
  Ticket 3710 --> https://svn.open-mpi.org/trac/ompi/ticket/3710
2013-08-01 00:32:10 +00:00
Nathan Hjelm
83a3fc2fd2 Add an option to control which hostnames orte_strip_prefix_from_node_names works
on.

This corrects a problem with Cray systems where the login node's hostname
was being stripped causing the login node to be used as a compute node by
mpirun.

cmr=v1.7.3:reviewer=rhc

This commit was SVN r28970.
2013-07-31 18:42:02 +00:00
Nathan Hjelm
ebbb32120a MCA/base: variable system updates
- Use an enumerator to handle bool values.

 - Fix a leak in the variable enumerator.

 - Fix a leak in an orte parameter.

This commit was SVN r28949.
2013-07-25 15:42:01 +00:00
Ralph Castain
6c1a140e99 Per request from Nathan, add a "commit" API to the opal db framework. This allows him to aggregate keys to work around the Cray's severe PMI limitations
This commit was SVN r28917.
2013-07-22 22:57:16 +00:00
Ralph Castain
5d12ab3873 Ensure we always set num_local_peers for both PMI2 and PMI1
This commit was SVN r28860.
2013-07-19 04:34:58 +00:00
Ralph Castain
b033a6b6d6 One last Cray-inspired fix...
Refs trac:3685

This commit was SVN r28857.

The following Trac tickets were found above:
  Ticket 3685 --> https://svn.open-mpi.org/trac/ompi/ticket/3685
2013-07-19 03:04:00 +00:00
Ralph Castain
92cb93b21e Remove set-but-unused variable
Refs trac:3685

This commit was SVN r28855.

The following Trac tickets were found above:
  Ticket 3685 --> https://svn.open-mpi.org/trac/ompi/ticket/3685
2013-07-19 01:42:35 +00:00
Ralph Castain
bc2586cf3c Refs trac:3685. Check error code returned by PMI2_Info_GetJobAttr.
This commit was SVN r28854.

The following Trac tickets were found above:
  Ticket 3685 --> https://svn.open-mpi.org/trac/ompi/ticket/3685
2013-07-19 01:24:51 +00:00
Ralph Castain
a10546d5c1 Cleanup and rename of platform files
This commit was SVN r28853.
2013-07-19 01:18:41 +00:00
Ralph Castain
e4e678e234 Per the RFC and discussion on the devel list, update the RTE-MPI error handling interface. There are a few differences in the code from the original RFC that came out of the discussion - I've captured those in the following writeup
George and I were talking about ORTE's error handling the other day in regards to the right way to deal with errors in the updated OOB. Specifically, it seemed a bad idea for a library such as ORTE to be aborting the job on its own prerogative. If we lose a connection or cannot send a message, then we really should just report it upwards and let the application and/or upper layers decide what to do about it.

The current code base only allows a single error callback to exist, which seemed unduly limiting. So, based on the conversation, I've modified the errmgr interface to provide a mechanism for registering any number of error handlers (this replaces the current "set_fault_callback" API). When an error occurs, these handlers will be called in order until one responds that the error has been "resolved" - i.e., no further action is required - by returning OMPI_SUCCESS. The default MPI layer error handler is specified to go "last" and calls mpi_abort, so the current "abort" behavior is preserved unless other error handlers are registered.

In the register_callback function, I provide an "order" param so you can specify "this callback must come first" or "this callback must come last". Seemed to me that we will probably have different code areas registering callbacks, and one might require it go first (the default "abort" will always require it go last). So you can append and prepend, or go first. Note that only one registration can declare itself "first" or "last", and since the default "abort" callback automatically takes "last", that one isn't available. :-)

The errhandler callback function passes an opal_pointer_array of structs, each of which contains the name of the proc involved (which can be yourself for internal errors) and the error code. This is a change from the current fault callback which returned an opal_pointer_array of just process names. Rationale is that you might need to see the cause of the error to decide what action to take. I realize that isn't a requirement for remote procs, but remember that we will use the SAME interface to report RTE errors internal to the proc itself. In those cases, you really do need to see the error code. It is legal to pass a NULL for the pointer array (e.g., when reporting an internal failure without error code), so handlers must be prepared for that possibility. If people find that too burdensome, we can remove it.

Should we ever decide to create a separate callback path for internal errors vs remote process failures, or if we decide to do something different based on experience, then we can adjust this API.

This commit was SVN r28852.
2013-07-19 01:08:53 +00:00
Ralph Castain
6c50c8167c Fix pmi-1 compile when no pmi2 is present
This commit was SVN r28849.
2013-07-18 22:45:08 +00:00
Ralph Castain
256034a3dc Sigh - fix a couple of spots I missed
Refs trac:3683

This commit was SVN r28843.

The following Trac tickets were found above:
  Ticket 3683 --> https://svn.open-mpi.org/trac/ompi/ticket/3683
2013-07-18 19:07:16 +00:00
Ralph Castain
fc3b777ef5 Cleanup a variable that isn't used if pmi2 support is available
Refs trac:3683

This commit was SVN r28841.

The following Trac tickets were found above:
  Ticket 3683 --> https://svn.open-mpi.org/trac/ompi/ticket/3683
2013-07-18 17:19:13 +00:00
Ralph Castain
92c6b806b9 Based on a patch submitted by Piotr Lesnicki of Bull, cleanup the PMI2 support. This has not been tested yet on multiple environments (e.g., Cray), so it needs more evaluation prior to moving to the 1.7 branch.
cmr:v1.7.3:reviewer=rhc

This commit was SVN r28837.
2013-07-18 14:46:07 +00:00