Jeff Squyres
c95215dfc2
oob_tcp: do not set KEEPALIVE on listening sockets
2015-05-20 17:28:45 -04:00
Jeff Squyres
32d81af35f
oob tcp: re-enable keepalive option for Mac
...
Plus very minor #if/#endif reduction.
2015-05-20 17:28:45 -04:00
rhc54
95c40e64b9
Merge pull request #584 from nkogteva/oob_ud_stress_test
...
oob ud: fixed a bug that prevented the work with QoS framework
2015-05-20 09:56:08 -06:00
Gilles Gouaillardet
dd28b1f680
orted/dfs: fix misc memory leaks
...
as reported by Coverity with CIDs 739887, 747706, 1196707-1196709 and 1269849
2015-05-20 13:09:46 +09:00
Ralph Castain
d3d3e73099
Per request from George, use defined(__APPLE__) instead of OPAL_HAVE_MAC. Don't try to close a negative socket
2015-05-15 07:13:42 -06:00
Ralph Castain
0a345d34e6
Plug the memory leak identified by George
2015-05-14 21:33:48 -06:00
Howard Pritchard
578430c36d
oob/alps: remove comment with personal reference
...
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-05-14 20:06:21 -07:00
Ralph Castain
8e30579e6e
The Mac appears to have problems with the keepalive support - once keepalive starts, the memory footprint soars. So disable keepalive on the Mac
2015-05-14 18:09:13 -06:00
Nadezhda Kogteva
d9dcf8352e
oob ud: fixed a bug that prevented the work with QoS framework (oob_stress_channel test)
2015-05-13 11:40:01 +03:00
Jeff Squyres
8e8d104520
oob ud: ibv_get_device_list()==NULL can mean no devices present
...
...which is not an error. Don't complain about it.
2015-05-12 10:54:39 -07:00
Jeff Squyres
8f941a6613
oob ud: better error msgs, tolerate systems without UD devices
...
It is perfectly ok to be on a system without UD devices.
Also, make some of the error messages better -- so that the user has a
clue about where the error messages are coming from, and what they
should do.
2015-05-11 13:11:51 -07:00
Mike Dubman
894ba28390
Merge pull request #559 from nkogteva/oob_ud
...
oob ud: made component more user adaptive; opal outputs were replaced by...
2015-05-11 21:09:28 +03:00
Ralph Castain
3cee4152fc
Fix the intercommunictor issue reported by Gilles. Instead of directly checking the reachability bitmap, ask the component if the proc is reachable when doing a send as the component is the final arbiter in such cases. Recirculate any messages that a daemon is trying to send to void race conditions. Cleanup listener sockets so we don't leak them
2015-05-11 09:16:25 -07:00
Howard Pritchard
3382d3ce61
ess/alps: remove unnecessary vpid calc
...
There was a redundant computation of the vpid
for orted's happening in ess/alps rte_init
method. Keep the more efficient alps based
method.
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-05-09 20:07:38 -07:00
Ralph Castain
b5382c9bf9
Rework the OOB selection logic to allow a component (e.g., usock) to direct that it be the sole active component. Remove prior disqualifying code in the oob/tcp component as it was too restrictive - if usock wasn't able to run, it left apps with no way to communicate to their daemon. Have the local daemon check the global modex for the RML URI info of the local procs so it can route messages between them when tcp is the primary channel.
...
A few other minor cleanups included.
2015-05-08 11:15:21 -07:00
Ralph Castain
6e95bcd583
Fix typo in oob_tcp.c when IPV6 enabled. Cleanup a few other warnings, including a type in coll_sm that prevented that component from registering its MCA params!
2015-05-07 21:05:08 -07:00
Gilles Gouaillardet
a80fda25d8
orte: rename the global variable component_map into orte_component_map
...
Thanks @goodell for pointing this !
2015-05-08 10:11:59 +09:00
Gilles Gouaillardet
2e384a3b65
initialize common symbols from orte
...
A few uninitialized common symbols are remaining (generated by flex) :
* orte/mca/rmaps/rank_file/rmaps_rank_file_lex.c: orte_rmaps_rank_file_leng
* orte/mca/rmaps/rank_file/rmaps_rank_file_lex.c: orte_rmaps_rank_file_text
* orte/util/hostfile/hostfile_lex.c: orte_util_hostfile_leng
* orte/util/hostfile/hostfile_lex.c: orte_util_hostfile_text
2015-05-08 10:11:58 +09:00
Ralph Castain
9cb2fcfa5c
Cleanup the qos code when --enable-timings is given
2015-05-06 20:24:27 -07:00
Ralph Castain
01a9bdf4cf
Cleanup of ud/oob component
2015-05-06 19:48:42 -07:00
Ralph Castain
1f8de276de
Consolidate all the QOS changes into one clean commit
2015-05-06 19:48:42 -07:00
Ralph Castain
8e3f0b1d33
Ensure the --tree-spawn option is inside any parens from the sh and ksh shell support
2015-05-06 15:18:15 -07:00
Ralph Castain
0bb73645f0
Silence Coverity warning
2015-04-30 20:49:28 -07:00
Ralph Castain
7d1980ba83
Add the ability to specify the number of desired slots in the --host option. Just giving a host name => one slot (multiple copies of the name yield one slot per copy). Giving "foo:3" indicates you want three slots - a shorthand notation for saying "foo" three times. Giving "foo:*" indicates you want the topology to set the number of slots based on the orte_set_slots param.
2015-04-30 20:35:23 -07:00
Ralph Castain
e26e7ad736
Better support automated tests for map, rank, and bind options
2015-04-30 14:01:13 -07:00
Ralph Castain
7d4f9970d8
Minor cleanup
2015-04-29 17:49:35 -07:00
Nadezhda Kogteva
01ce58391e
oob ud: made component more user adaptive; opal outputs were replaced by help messages.
2015-04-28 15:36:32 +03:00
Jeff Squyres
8fbf34b196
oob ud: put call to ibv_fork_init() before *all* ibv calls
...
Move the call to opal_common_verbs_fork_test() to up before the call
to ibv_get_device_list() (just curious -- why not use
opal_ibv_get_device_list()?). This ensures that the call to
ibv_fork_init() is before *all* other ibv_* calls.
2015-04-24 14:19:06 -07:00
Ralph Castain
9104e81958
When --map-by node, we should be unbound. Also remove dead code due to copy/paste error.
2015-04-23 20:35:54 -07:00
Ralph Castain
5003be5c5c
If the user specifies a --map-by <foo> option, then default to bind-to <foo> unless they specify a bind-to option. If they map-by slot/node, then use the default policy based on num_procs.
2015-04-23 13:30:21 -07:00
Ralph Castain
d5e4fd059f
Ensure the binding and locale strings are always defined
2015-04-23 07:43:37 -07:00
Ralph Castain
cb7330a543
Get the output to lineup properly
2015-04-23 07:38:51 -07:00
Jeff Squyres
79243aca4e
display-devel-map: minor output tweak
...
hwloc output can get fairly long, especially on machines with lots of
cores and/or hyperthreads. So put the Locale and Binding output on
separate lines.
2015-04-23 06:14:57 -07:00
Ralph Castain
58e646ccfd
Reduce confusion by having the devel-map display in the same format as report-bindings
2015-04-23 04:30:00 -07:00
Ralph Castain
43229d056e
Protect one more place from a NULL object
2015-04-20 18:45:57 -07:00
Jeff Squyres
11e8c2096b
plm rsh: assign some levels to the rsh PLM MCA params
2015-04-20 16:18:57 -07:00
Nathan Hjelm
359a282e7d
ess/singleton: MCA variable synonyms can not currently have NULL for both framework and component
...
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-04-20 16:50:52 -06:00
Ralph Castain
e8387fcf88
Protect tools that can never run in distributed mode from getting confused by PMI.
2015-04-20 15:42:57 -07:00
Nathan Hjelm
45e053dbce
orte: use C99 subobject naming for component initialization
...
This commit helps future-proof orte components by initializing each
component member by name.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-04-18 10:29:58 -06:00
Ralph Castain
34b53ac3dc
Silence Coverity warnings
2015-04-18 07:48:22 -07:00
Ralph Castain
12bfb27161
Redo in cleaner form: Per request from Andy Rieb, add ability to pass PATH and LD_LIBRARY_PATH elements to ssh command
2015-04-17 16:11:37 -07:00
Nadezhda Kogteva - nadezhda.kogteva@itseez.com
c2678b0cc9
oob ud: fixes and parameter adjustment
2015-04-17 16:22:43 +03:00
Nathan Hjelm
3436f2917d
Merge pull request #449 from hjelmn/mca_base_update
...
mca/base update
2015-04-16 08:41:48 -06:00
Ralph Castain
d9c555b547
Revert "Per request from Andy Rieb, add ability to pass PATH and LD_LIBRARY_PATH elements to ssh command"
...
This reverts commit open-mpi/ompi@278324c52a .
Revert "Add the ability to pass args to the rsh/ssh command line"
This reverts commit open-mpi/ompi@6f227f8564 .
2015-04-16 08:03:14 -06:00
rhc54
79b9c50717
Merge pull request #535 from rhc54/topic/rsh
...
Add the ability to pass args to the rsh/ssh command line
2015-04-15 21:11:46 -06:00
Ralph Castain
278324c52a
Per request from Andy Rieb, add ability to pass PATH and LD_LIBRARY_PATH elements to ssh command
2015-04-15 20:30:04 -06:00
Ralph Castain
0e23f76eee
Fix comment
2015-04-15 20:09:14 -06:00
Ralph Castain
6f227f8564
Add the ability to pass args to the rsh/ssh command line
2015-04-15 20:07:13 -06:00
Howard Pritchard
283ef4c05d
oob/config: if --with-verbs=no, no ud
...
The oob/ud configure was not honoring the case
if the ompi is configured with --with-verbs=no.
This fixes that problems.
Fixes #522
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-04-14 06:31:18 -07:00
Nathan Hjelm
113c890ccf
Merge pull request #520 from hjelmn/valgrind_cleanness
...
fix memory leaks and valgrind errors
2015-04-13 10:09:34 -06:00