1
1
Граф коммитов

4733 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
095a8fa684 We don't need to know about non-fatal errors from setting socket options 2015-03-20 07:16:31 -07:00
Ralph Castain
a013f3059f For scalability reasons, and to make life easier for the poor Cray-ites, don't bang on the system for the username - we'll just use the uid. 2015-03-19 21:24:13 -07:00
Howard Pritchard
990e9b47e0 Merge pull request #486 from hppritcha/topic/issue_484
orte/oob: implement alps oob component
2015-03-19 19:40:40 -06:00
Ralph Castain
43a3baad5e Ensure we use the first compute node's topology for mapping
Don't filter the topology by cpuset if you are mpirun until you know that no other compute nodes are involved. This deals with the corner case where mpirun is executing on a node of different topology from the compute nodes.

Simplify - don't mandate that all cpus in the given cpuset be present on every node. We can then run everything thru the filter as before, which ensures that any procs run on mpirun are also contained within the specified cpuset.

Correctly count the number of available PUs under each object when given a cpuset

Fix the default binding settings, and correctly count PUs when no cpuset is given

Ensure the binding policy gets set in all cases
2015-03-19 16:30:36 -07:00
Howard Pritchard
6054975913 oob/alps: add configure file for alps oob
Have to have alps rpms installed on a system
for alps component to build, even if separated
by a level of indirection.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-03-19 15:38:14 -07:00
Howard Pritchard
b1f31a4364 orte/oob: implement alps oob component
Implement an almost-do-nothing alps oob component.
When using aprun to launch a job on Cray system,
there is no reason to need an oob system, since ompi
relies on Cray PMI for oob communication.

Fixes #484
2015-03-19 14:11:40 -07:00
Nadezhda Kogteva
7c25b4cea6 grpcomm: fixed brks and rcd algorithms - added enough space for masks in order to get them working in the large scale. 2015-03-18 14:33:04 +02:00
Ralph Castain
50277fec76 Adjust MCA param 2015-03-17 19:46:31 -07:00
rhc54
b41d2ad6c4 Merge pull request #481 from rhc54/topic/slurm
Add new MCA parameter to support edge case with debugger at LLNL
2015-03-17 07:40:55 -07:00
Ralph Castain
b01e8c1063 Include the FQDN version and non-stripped version of the hostname in our list of aliases as these (plus localhost) are the most common aliases we see. 2015-03-17 06:26:26 -07:00
Ralph Castain
d7d8ae46ed We no longer pass the RML URI for procs launched via mpirun as the daemon has no need for that info. 2015-03-17 06:10:20 -07:00
Ralph Castain
3e32c360c7 Add new MCA parameter to support edge case with debugger at LLNL 2015-03-16 20:04:05 -07:00
Ralph Castain
a0487e014c Further reduce the RARP load by removing getaddrinfo for IPv6 connections. Correct typo when checking return on inet_pton. Don't consider the TCP component for apps that are launched via mpirun as it will never be used. 2015-03-16 19:42:05 -07:00
Ralph Castain
5ae42c816e Attempt to reduce the RARP traffic during definition of allocations 2015-03-16 16:26:40 -07:00
Ralph Castain
64d11f170a Adjust the default keepalive interval. Refactor the code when setting keepalive options 2015-03-16 12:32:58 -07:00
Ralph Castain
4ded049cbc Modify MCA param description 2015-03-16 11:57:32 -07:00
Ralph Castain
019bba5caf Cleanup a bit - don't need to lookup the protocol number if we just use the right define 2015-03-16 11:54:51 -07:00
Ralph Castain
69ac25bf55 Add support for TCP keepalive on inter-node sockets 2015-03-16 09:59:44 -07:00
adrianreber
714d9aa67e Merge pull request #348 from adrianreber/topic/orte_cr_continue_like_restart
Topic/orte cr continue like restart
2015-03-12 14:54:02 +01:00
Nathan Hjelm
695dcd5a28 oob/ud: fix compiler warning 2015-03-11 10:53:32 -06:00
Adrian Reber
c08e234af7 FT: fix compilation using --with-ft (5/5)
Enabling the FT code breaks compilation (again). This series
tries to fix the compiler errors. This is again only fixing
the compiler errors without any warranty that the result
might actually support FT again.

With the changes introduced in the previous patches in this series
some goto constructs for cleanup are no longer necessary and removed.
2015-03-11 14:23:33 +01:00
Adrian Reber
8ba41a834a FT: fix compilation using --with-ft (4/5)
Enabling the FT code breaks compilation (again). This series
tries to fix the compiler errors. This is again only fixing
the compiler errors without any warranty that the result
might actually support FT again.

This patch tries to handle the new xcast semantic.
2015-03-11 14:23:33 +01:00
Adrian Reber
1c5a8df724 FT: fix compilation using --with-ft (2/5)
Enabling the FT code breaks compilation (again). This series
tries to fix the compiler errors. This is again only fixing
the compiler errors without any warranty that the result
might actually support FT again.

The FT code used barrier mechanisms which have been removed
with aec5cd08bd. This patch replaces
all those different barriers with opal_pmix.fence(NULL, 0);
I am not sure this is completely correct but at least a starting
point for a review.
2015-03-11 14:23:33 +01:00
Adrian Reber
f45dd069bd FT: fix compilation using --with-ft (1/5)
Enabling the FT code breaks compilation (again). This series
tries to fix the compiler errors. This is again only fixing
the compiler errors without any warranty that the result
might actually support FT again.

This first patch moves orte_cr_continue_like_restart from ORTE
to opal_cr_continue_like_restart in OPAL. This only leaves three
calls from OPAL to ORTE in the FT code. As it is not yet 100%
clear how to handle these calls the code orte_sstore.set_attr()
has been #ifdef'd out for now.
2015-03-11 14:23:33 +01:00
Gilles Gouaillardet
a69d935d55 oob/tcp: fix misc issues
as reported by Coverity with CIDs 70726, 710564,
1196630, 1269805, 1269803, 1269932
2015-03-10 19:32:01 +09:00
Gilles Gouaillardet
dc0bc756dc iof/base: fix misc memory leak
as reported by Coverity with CID 1196732
2015-03-10 14:37:53 +09:00
Jeff Squyres
a026456bef (orte|ompi|oshmem)*info tools: convert to opal_dl interface
Noe that this commit removes option:lt_dladvise from the various
"info" tools output.  This technically breaks our CLI "ABI" because
we're not deprecating it / replacing it with an alias to some other
"into" tool output.

Although the dl/libltdl component contains an "have_lt_dladvise" MCA
var that contains the same information, the "option:lt_dladvise"
output from the various "info" tools is *not* an MCA var, and
therefore we can't alias it.  So it just has to die.
2015-03-09 08:18:13 -07:00
Gilles Gouaillardet
59be12b260 filem/raw: fix misc memory leaks
as reported by Coverity with CIDs 716815, 716817, 720760,
1196703, 1196704, 1196746
2015-03-09 19:56:20 +09:00
Gilles Gouaillardet
2ab9a411f8 plm/base: fix misc memory leaks
as reported by Coverity with CIDs 1196733 and 1196745
2015-03-09 16:25:07 +09:00
Gilles Gouaillardet
fa10025843 ras/slurm: fix misc memory leaks
as reported by Coverity with CIDs 968580 and 1196723-1196727
2015-03-09 15:58:51 +09:00
Gilles Gouaillardet
eae39bd948 ras/simulator: fix misc memory leaks
as reported by Coverity with CIDs 710647, 714133 and 714134
2015-03-09 15:52:29 +09:00
Gilles Gouaillardet
4c0eb11e08 orterun: fix misc errors
as reported by Coverity with CIDs 70700, 71039, 710651
2015-03-09 11:57:18 +09:00
Gilles Gouaillardet
33841361c0 orte-clean: use pclose instead of fclose
as reported by Coverity with CID 1287029
2015-03-09 11:17:59 +09:00
Elena
6c6fe75c7b added one more time interval for barrier to pmix unit test 2015-03-06 10:33:14 +02:00
Ralph Castain
64ec498a20 Add a declspec 2015-03-05 19:48:27 -08:00
Ralph Castain
eaa666bd57 Instantiate debug output variable 2015-03-05 12:25:49 -08:00
Ralph Castain
7ce0a9931c Updates to the notifier interfaces to support system events 2015-03-05 10:39:25 -08:00
Gilles Gouaillardet
7de3f35b90 pml/rsh: fix misc memory leaks
as reported by Coverity with CIDs 71091, 71230, 71231, 72274, 72389,
1196718 and 1196719
2015-03-05 20:03:37 +09:00
Gilles Gouaillardet
33352e9506 schizo: fix misc memory leak
as reported by Coverity with CID 1196722
2015-03-05 14:06:18 +09:00
Gilles Gouaillardet
89806c6261 orte/util: fix memory leaks
as reported by Coverity with CIDs 70845, 71855, 710652,
1196738, 1196739, 1196757, 1196758, 1269863 and 1269883
2015-03-05 14:06:18 +09:00
Gilles Gouaillardet
4e7b5240e4 orte/tools: fix misc memory leaks
as reported by Coverity with CIDs 70700, 71039, 71854, 72384 and 710651
2015-03-05 14:06:18 +09:00
Gilles Gouaillardet
d1b2f043ff fix misc memory leaks
as already reported by Coverity with CIDs
71818, 71819, 72250, 715767, 1196749 and 1274002
2015-03-05 13:58:05 +09:00
Gilles Gouaillardet
42f5a36ee3 rmaps/seq: fix misc memory leaks
as reported by Coverity with CIDs 1269886 and 1269887
2015-03-02 15:31:11 +09:00
Gilles Gouaillardet
0c7a2846d1 rmaps/rank_file: fix misc memory leaks
as reported by Coverity with CIDs 72250 and 1196774
2015-03-02 15:31:11 +09:00
Gilles Gouaillardet
c15b919635 rmaps/lama: fix misc memory leaks
as reported by Coverity with CIDs 719263, 719264, 1196712 and 1269842
2015-03-02 15:31:11 +09:00
Gilles Gouaillardet
456baeb71b rmaps/base: fix misc memory leaks
as reported by Coverity with CIDs 1196751, 1196754, 1196755 and 1269866
2015-03-02 15:31:11 +09:00
Gilles Gouaillardet
d8f3b378b3 orte/oob: fix misc memory leaks
as reported by Coverity as CIDs 1196748, 1196749 and 1269895
2015-03-02 15:31:11 +09:00
Jeff Squyres
336626dafe spelling: trivial spelling fix
s/interupted/interrupted/gi
2015-02-27 18:30:43 -08:00
Gilles Gouaillardet
ab78c7f54a orted/pmix: fix misc resource leak
as reported by Coverity with CID 1269844
2015-02-27 19:25:55 +09:00
Mike Dubman
dbc15009b6 Merge pull request #415 from alinask/topic/fix_fork_support_flow
Fix the calls to ibv_fork_init and remove btl_openib_want_fork_support.
2015-02-26 21:50:11 +02:00