Ralph Castain
095a8fa684
We don't need to know about non-fatal errors from setting socket options
2015-03-20 07:16:31 -07:00
Ralph Castain
a013f3059f
For scalability reasons, and to make life easier for the poor Cray-ites, don't bang on the system for the username - we'll just use the uid.
2015-03-19 21:24:13 -07:00
Howard Pritchard
990e9b47e0
Merge pull request #486 from hppritcha/topic/issue_484
...
orte/oob: implement alps oob component
2015-03-19 19:40:40 -06:00
Ralph Castain
ed5d10b816
Somehow slipped by - ensure we correctly count the cores
2015-03-19 17:56:18 -07:00
Ralph Castain
43a3baad5e
Ensure we use the first compute node's topology for mapping
...
Don't filter the topology by cpuset if you are mpirun until you know that no other compute nodes are involved. This deals with the corner case where mpirun is executing on a node of different topology from the compute nodes.
Simplify - don't mandate that all cpus in the given cpuset be present on every node. We can then run everything thru the filter as before, which ensures that any procs run on mpirun are also contained within the specified cpuset.
Correctly count the number of available PUs under each object when given a cpuset
Fix the default binding settings, and correctly count PUs when no cpuset is given
Ensure the binding policy gets set in all cases
2015-03-19 16:30:36 -07:00
Howard Pritchard
6054975913
oob/alps: add configure file for alps oob
...
Have to have alps rpms installed on a system
for alps component to build, even if separated
by a level of indirection.
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-03-19 15:38:14 -07:00
Howard Pritchard
b1f31a4364
orte/oob: implement alps oob component
...
Implement an almost-do-nothing alps oob component.
When using aprun to launch a job on Cray system,
there is no reason to need an oob system, since ompi
relies on Cray PMI for oob communication.
Fixes #484
2015-03-19 14:11:40 -07:00
lrrajesh
4dc75687e2
Notification msg add severity to the output
2015-03-18 13:55:03 -07:00
Howard Pritchard
edf9e8ba8f
mtl/psm: coverity fixes
...
Fix CIDS 1270176 - 1270179
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-03-18 11:02:01 -06:00
Nathan Hjelm
ccba8ce856
Merge pull request #457 from hjelmn/mpit_fixes
...
mca/base: fix bugs in framework deregistration/re-registration
2015-03-18 08:37:49 -06:00
Mike Dubman
c68a0ba99b
Merge pull request #482 from nkogteva/master
...
grpcomm: fixed brks and rcd algorithms - added enough space for masks in...
2015-03-18 16:09:59 +02:00
Nadezhda Kogteva
7c25b4cea6
grpcomm: fixed brks and rcd algorithms - added enough space for masks in order to get them working in the large scale.
2015-03-18 14:33:04 +02:00
Ralph Castain
50277fec76
Adjust MCA param
2015-03-17 19:46:31 -07:00
rhc54
b41d2ad6c4
Merge pull request #481 from rhc54/topic/slurm
...
Add new MCA parameter to support edge case with debugger at LLNL
2015-03-17 07:40:55 -07:00
rhc54
7f8fcb7fb7
Merge pull request #479 from rhc54/topic/rarp
...
Attempt to reduce the RARP traffic during definition of allocations
2015-03-17 07:40:35 -07:00
Ralph Castain
b01e8c1063
Include the FQDN version and non-stripped version of the hostname in our list of aliases as these (plus localhost) are the most common aliases we see.
2015-03-17 06:26:26 -07:00
Ralph Castain
d7d8ae46ed
We no longer pass the RML URI for procs launched via mpirun as the daemon has no need for that info.
2015-03-17 06:10:20 -07:00
Ralph Castain
3e32c360c7
Add new MCA parameter to support edge case with debugger at LLNL
2015-03-16 20:04:05 -07:00
Ralph Castain
a0487e014c
Further reduce the RARP load by removing getaddrinfo for IPv6 connections. Correct typo when checking return on inet_pton. Don't consider the TCP component for apps that are launched via mpirun as it will never be used.
2015-03-16 19:42:05 -07:00
Ralph Castain
5ae42c816e
Attempt to reduce the RARP traffic during definition of allocations
2015-03-16 16:26:40 -07:00
Jeff Squyres
1196069815
Merge pull request #476 from maxlevesque/master
...
🐛 correct path to configure file
2015-03-16 14:28:44 -07:00
rhc54
ee23b7f300
Merge pull request #477 from rhc54/topic/keepalive
...
Add keepalive support to the TCP OOB component
2015-03-16 14:18:38 -07:00
Ralph Castain
64d11f170a
Adjust the default keepalive interval. Refactor the code when setting keepalive options
2015-03-16 12:32:58 -07:00
Ralph Castain
4ded049cbc
Modify MCA param description
2015-03-16 11:57:32 -07:00
Ralph Castain
019bba5caf
Cleanup a bit - don't need to lookup the protocol number if we just use the right define
2015-03-16 11:54:51 -07:00
Maximilien Levesque
7bc3f2ce61
Merge pull request #1 from maxlevesque/maxlevesque-patch-1
...
correct path to configure file
2015-03-16 18:46:13 +01:00
Maximilien Levesque
748d38b48a
correct path to configure file
...
./configure changed to ../configure
2015-03-16 18:45:58 +01:00
Ralph Castain
69ac25bf55
Add support for TCP keepalive on inter-node sockets
2015-03-16 09:59:44 -07:00
Ralph Castain
0cfb4f29aa
Silence compiler warning
2015-03-16 09:59:21 -07:00
Mike Dubman
7640507438
Merge pull request #472 from miked-mellanox/topic/fix_compile_warn
...
btl/openib: fix compiler warning, by HalR
2015-03-13 14:06:07 +02:00
Jeff Squyres
0166318966
opal_check_pmi: protect un-prefixed shell variables
...
Since there's unfortunately only a global namespace for shell
variables, we need to protect un-prefixed shell variables with
OPAL_VAR_SCOPE_PUSH/POP.
2015-03-13 04:48:31 -07:00
Jeff Squyres
4ab9e67832
hwloc external: portability updates
...
Change "test -a" to "&& test", and change foo="$bar" to foo=$bar. No
substantive code changes.
2015-03-13 04:40:09 -07:00
Jeff Squyres
4d63c88ed1
hwloc external: whitespace cleanup, no code changes
2015-03-13 04:40:05 -07:00
Mike Dubman
00784ae3ba
btl/openib: fix compiler warning, by HalR
2015-03-13 13:17:23 +02:00
Todd Kordenbrock
9350b06f7d
btl-portals4: fix compiler warnings
2015-03-12 20:34:04 -05:00
Todd Kordenbrock
515d9e8cc9
mtl-portals4: fix compiler warnings
2015-03-12 20:34:04 -05:00
Jeff Squyres
65a0e041ac
dl: need to use LIBADD, not LIBS
...
When we use LIBADD for static libraries, the dependent libraries get
propagated properly. For example, the dl/dlopen component will almost
certainly require the -ldl library; when using LIBS, that doesn't get
propagated elsewhere in the tree, but when using LIBADD, it does
(e.g., when linking opal_wrapper_compiler).
2015-03-12 15:01:14 -07:00
Ryan Grant
6f76984a3c
Merge pull request #470 from tkordenbrock/topic/update-portals4-to-btl3
...
btl-portals4: implement the BTL 3.0 interface
2015-03-12 15:34:05 -06:00
Jeff Squyres
a1daa39425
libfabric: update to Github lifabric 90ac5a258418e
...
Update to latest upstream Github lifabric in order to fix some usnic
bugs.
2015-03-12 13:23:32 -07:00
Todd Kordenbrock
d1656347c8
btl-portals4: implement the BTL 3.0 interface
2015-03-12 14:19:44 -05:00
adrianreber
714d9aa67e
Merge pull request #348 from adrianreber/topic/orte_cr_continue_like_restart
...
Topic/orte cr continue like restart
2015-03-12 14:54:02 +01:00
Mike Dubman
b4d6420797
Merge pull request #468 from alinask/topic/fix_yalla_mxm_cov
...
MTL_MXM/PML_YALLA: fix coverity issues.
2015-03-12 13:22:44 +02:00
Alina Sklarevich
28586caecf
MTL_MXM/PML_YALLA: fix coverity issues.
2015-03-12 11:49:22 +02:00
Nathan Hjelm
695dcd5a28
oob/ud: fix compiler warning
2015-03-11 10:53:32 -06:00
Howard Pritchard
da85d5fc0a
Merge pull request #467 from hppritcha/topic/minor_fcoll_static_coverity_fix
...
fcoll/static: minor fix for coverity
2015-03-11 10:28:05 -06:00
Nathan Hjelm
fd78491768
Merge pull request #451 from elenash/master
...
fix: mca_base_env_var mca parameter is never handled if it's set from am...
2015-03-11 09:54:25 -06:00
Nathan Hjelm
ce6caab2a7
Merge pull request #463 from hjelmn/cuda_async
...
btl/openib: cuda: fix CUDA-aware support with async copy
2015-03-11 09:52:48 -06:00
Howard Pritchard
66fee3bd18
fcoll/static: minor fix for coverity
...
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-03-11 09:11:49 -06:00
Jeff Squyres
c61dd4d56f
usnic: each err eq entry reports *1* completion
...
Actually, the return from fi_eq_readerr() only indicates a *single*
error completion (not err_entry.data completions).
2015-03-11 08:07:20 -07:00
Ralph Castain
2de5cd6e5f
Ensure we don't install the libevent internal headers
2015-03-11 07:35:20 -07:00