1
1
Граф коммитов

4285 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
919d7fcf49 We cannot use OFI to determine when daemons can finalize as we don't see the "sockets" go away. So always use the OOB for the mgmt conduit - this provides the necessary termination signal AND ensures that IOF and other mgmt messages go solely across TCP.
Cleanup the way we look for matching OFI addresses by using the opal_net_samenetwork helper function. This now works for multi-network environments, but only using the socket provider

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-07 13:51:30 -07:00
Ralph Castain
bd1793ad17 Get the pmix/ext2x component to work. Fix a minor problem in the libevent external component.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-06 20:06:28 -07:00
Ralph Castain
93cf3c7203 Update OPAL and ORTE for thread safety
(I swear, if I look this over one more time, I'll puke)

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-06 12:30:57 -07:00
Ralph Castain
a28eaf914a Silence warnings when terminating
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-05 13:53:07 -07:00
Ralph Castain
8f526968c2 Do not hang if we cannot relay messages. Eliminate extra error log message
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-05 06:35:19 -07:00
Ralph Castain
51b4078b70 Merge pull request #3648 from rhc54/topic/ofi
Clean up the conduit open code so we return detectable errors when co…
2017-06-02 18:08:55 -07:00
Ralph Castain
e884cbf5f5 Even though the ofi component doesn't do any routing itself, the rest of the code base (e.g., grpcomm) needs to know what routing module this component is using. So set it to the "direct" module, and don't allow ofi to be used if that module isn't available.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-02 15:47:25 -07:00
Ralph Castain
ba9a6078c2 Add ability to select transport, and only compare the first one in the conduit list for a match. This lets you select which conduit to use for OFI - if you set "-mca rml_ofi_transports ethernet" you'll pickup the mgmt conduit. If you set "-mca rml_ofi_transports fabric", you'll get the coll conduit
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-02 14:31:23 -07:00
Jeff Squyres
af9565ec25 ess: add missing <signal.h> header
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-06-02 14:11:40 -07:00
Ralph Castain
066d5eedce Shift the signal forwarding code to ess/base so it can be available to more than just the hnp component. Extend the slurm component to use it so that any signals given directly to the daemons by their slurmstepd get forwarded to their local clients
Check for NULL

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-02 10:59:14 -07:00
Ralph Castain
6b3bbd30c5 Clean up the conduit open code so we return detectable errors when conduit not opened.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-02 10:40:51 -07:00
Ralph Castain
2ab4f93f6a Instead of "forced_terminate" just quietly causing the daemon to disappear, let's at least attempt to let the user know where the problem occurred.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-06-02 08:28:16 -07:00
anandhi
6ddb487744 Cleaned up the send_msg(), moved checking for send to self into the send_nb()
and send_buffer_nb()
	modified:   orte/mca/rml/ofi/rml_ofi_send.c

Signed-off-by: Anandhi Jayakumar <anandhi.s.jayakumar@intel.com>
2017-06-01 17:50:54 -07:00
Ralph Castain
9d6b929894 Fix uninitialized variable. Set exit codes for failed launch so we get pretty error messages
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-31 07:38:37 -07:00
Ralph Castain
26e7515a5e Don't sweat the "sync" settings on file descriptors as those flags aren't apparently fully portable
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-30 20:37:26 -07:00
Ralph Castain
5d990b557c Reorg ordering so that bare executable names also are found
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-30 15:58:55 -07:00
Ralph Castain
321abfc8c6 Fix cwd and preload-binary options
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-30 14:07:22 -07:00
Ralph Castain
ad108ba44d Fix the DVM
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-30 11:42:42 -07:00
Ralph Castain
9a8811a246 Ensure that data from a job that was stored in ompi-server is purged once that job completes. Cleanup a few typos. Silence a Coverity warning
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-30 09:43:01 -07:00
Ralph Castain
f3ab326b4a Add some debug code for detecting leaking file descriptors. At the end of each job (and if MCA param is set), have each daemon compute the number of open fds and their characteristics and print a summary
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-29 11:25:20 -07:00
Ralph Castain
87201a80ff Silence coverity warnings
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-27 11:45:53 -07:00
Ralph Castain
8c2a06477c Fix ompi-server operations
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-26 08:57:55 -07:00
Ralph Castain
657e701c65 Add debug verbosity to the orte data server and pmix pub/lookup functions
Start updating the various mappers to the new procedure. Remove the stale lama component as it is now very out-of-date. Bring round_robin and PPR online, and modify the mindist component (but cannot test/debug it).

Remove unneeded test

Fix memory corruption by re-initializing variable to NULL in loop

Resolve the race condition identified by @ggouaillardet by resetting the
mapped flag within the same event where it was set. There is no need to
retain the flag beyond that point as it isn't used again.

Add a new job attribute ORTE_JOB_FULLY_DESCRIBED to indicate that all the job information (including locations and binding) is included in the launch message. Thus, the backend daemons do not need to do any map computation for the job. Use this for the seq, rankfile, and mindist mappers until someone decides to update them.

Note that this will maintain functionality, but means that users of those three mappers will see large launch messages and less performant scaling than those using the other mappers.

Have the mindist module add procs to the job's proc array as it is a fully described module

Protect the hnp-not-in-allocation case

Per path suggested by Gilles - protect the HNP node when it gets added in the absence of any other allocation or hostfile

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-25 18:41:27 -07:00
Gilles Gouaillardet
22ab73cb1a Merge pull request #3471 from ggouaillardet/topic/execve_cmd
odls: fix handling of the orte fork agent
2017-05-15 15:07:39 +09:00
Ralph Castain
b527c40dae Remove debug
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-12 12:41:36 -07:00
Ralph Castain
23af6c9d02 Merge pull request #3519 from rhc54/topic/nolocal
Fix --nolocal
2017-05-12 09:57:52 -07:00
Ralph Castain
45bbd598c1 Fix --nolocal
Fix the --nolocal option by ensuring we always check/remove the HNP from the list of available nodes if the flag is set
Ensure that the HNP node is included as available when nothing else is given

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-12 09:03:26 -07:00
Ralph Castain
29e083bffd Fix total_slots_allocated computation
On unmanaged allocations, we need to update the total_slots_allocated once the daemons have been launched and "discovered" their topology

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-12 08:21:52 -07:00
Ralph Castain
9164afbb08 When a daemon force-terminates, we don't get the show_help message it was trying to send because the message is at a lower priority than the termination event. Resolve this by putting the oob in its own progress thread. Also, use only that one thread by default - if someone needs more progress threads in the OOB, they can use the MCA param to get them.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-11 06:52:55 -07:00
Ralph Castain
911961ee21 Sigh - remove debug
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-10 11:26:42 -07:00
Ralph Castain
2d93d15aa7 Merge pull request #3502 from rhc54/topic/cisco
Fix nidmap computation to deal with hetero nodes
2017-05-10 11:21:12 -07:00
Ralph Castain
50646b07ce Update the RML OFI by copying the updated files from @anandhis branch
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-10 09:17:06 -07:00
Ralph Castain
442e307a6e Fix the nidmap computation to deal with hetero nodes
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-10 08:43:28 -07:00
Ralph Castain
ef0e0171c9 Implement the changes required to support cross-library coordination. Update PMIx to support intra-process notifications and ensure that we always notify ourselves for events. Add a new ompi/interlib directory where cross-lib coordination code can go, and put the code to declare ourselves there (called from ompi_mpi_init.c).
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-08 10:04:50 -07:00
Gilles Gouaillardet
16fc0996e6 odls: fix handling of the orte fork agent
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-05-08 16:07:13 +09:00
Ralph Castain
180809f2ef Do not pass topologies during tree spawn of daemons as there is no way the HNP can know the backend topologies at that point. Any needed topologies will be sent along with the launch_apps command
Do not pass param file MCA params if the user has requested that no param files be read - required when trying to avoid launch time penalties from large numbers of processes reading default param files. The daemon picks them up and passes them along anyway, so it isn't clear what value we gain from having them all read the defaults

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-07 21:14:43 -07:00
Ralph Castain
a143800bce Enable full operations under SLURM on Cray systems by co-locating a daemon with mpirun when mpirun is executing on a compute node in that environment. This allows local application procs to inherit their security credential from the daemon as it will have been launched via SLURM
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-06 19:08:50 -07:00
Ralph Castain
3a434d75d6 By default, use the system default snd/recv buffer sizes
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-05-05 09:58:05 -07:00
Gilles Gouaillardet
57b4144e57 orte: use compression for ORTE_DAEMON_REPORT_TOPOLOGY_CMD answer
Refs open-mpi/ompi#3414

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-04-27 17:21:59 +09:00
Gilles Gouaillardet
49cd40b2df compress the topology sent by the first orted
Refs open-mpi/ompi#3414

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-04-27 16:20:11 +09:00
Gilles Gouaillardet
c38ef3d46f oob/tcp: fix short writev handling in send_msg()
Fixes open-mpi/ompi#3414

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-04-27 10:24:38 +09:00
Howard Pritchard
462342d148 Merge pull request #3311 from hppritcha/topic/libfabric_moves_to_ofi
common/libfabric: move libfabric to ofi
2017-04-21 07:50:38 -06:00
Howard Pritchard
841192645b common/libfabric: move libfabric to ofi
This PR renames the common library for OFI libfabric from
libfabric to ofi.  There are a number of reasons this
is good to do:

1) its shorter and replaces 9 characters with three for
   function names for what may eventually be a fairly extensive interface
2) OFI is the term used for MTL and RML components that use
   the OFI libfabric interface
3) A planned OSC component will also use the OFI term.
4) Other HPC libraries that can use OFI libfabric tend to use
   the term "ofi" internally and also in their configure options
   relevant to OFI libfabric (i.e. MPICH/CH4, Intel MPI, Sandia SHMEM)

There seem to be comments in places in the Open MPI source
code that indicate that this common library will be going away.
Far from it as we will want to be able to share things like
AV objects between OMPI and possibly OSHMEM components that
use the OFI libfabric interface.

This PR also adds a synonym to the --with-libfabric(-libdir)
configury options: --with-ofi and with-ofi-libdir.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-04-20 13:07:16 -06:00
Nathaniel Graham
34b4aeb17f Merge pull request #3339 from nrgraham23/mpirun_help_improvements
Additional mpirun --help changes
2017-04-19 14:05:07 -06:00
Nathaniel Graham
01312b2f90 Additional mpirun --help changes
This commit recategorizes several mpirun arguments,
and moves the information for mpirun --help arguments
to the bottom of the general help message.  I also
added the OPAL_CMD_LINE_OTYPE field to two commands
that were missed initially because they were not
in the same area as the others.

Signed-off-by: Nathaniel Graham <ngraham@lanl.gov>
2017-04-19 11:43:45 -06:00
Howard Pritchard
3918b7a796 Merge pull request #3213 from hppritcha/topic/remove_loadleveer
orte/ras: remove loadleveler support
2017-04-18 09:18:54 -06:00
Ralph Castain
bb1aaa3286 Use the node index to compare to daemon vpid when identifying procs to bind
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-14 02:37:25 -07:00
Ralph Castain
67156556ce On behalf of Josh, ensure we flag that the child is no longer alive since we are killing it with SIGKILL
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-13 21:07:26 -07:00
Ralph Castain
1585854335 Minor coverity cleanups
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-12 19:31:35 -07:00
Ralph Castain
0500cc1c66 Update the debugger launch code to reflect the new backend mapping method.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-12 13:31:18 -07:00
Ralph Castain
539f71d0cc Merge pull request #3310 from marksantcroos/fix/alps_wdir
Bring ALPS ODLS up to par regarding wdir.
2017-04-11 17:30:04 -07:00
Mark Santcroos
27fa8aabd6 Hardcode basename to "orted" for error reporting.
Signed-off-by: Mark Santcroos <mark.santcroos@rutgers.edu>
2017-04-11 18:59:23 -04:00
Mark Santcroos
af3a6e1a29 Verify that the chdir(2) succeeds.
Signed-off-by: Mark Santcroos <mark.santcroos@rutgers.edu>
2017-04-12 00:37:37 +02:00
Ralph Castain
97e38e6d84 Move a free to a little later in case the verbose output needs it
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-11 11:21:12 -07:00
Mark Santcroos
36ac54b5d8 Bring ALPS ODLS up to par regarding wdir.
Signed-off-by: Mark Santcroos <mark.santcroos@rutgers.edu>
2017-04-10 08:15:07 -04:00
Ralph Castain
95ae0d1df3 Cleanup timing macros for portability across compilers. Rename the --enable-timing configure option to be --enable-pmix-timing so it doesn't pickup external timing requests. Remove a stale function reference in PMIx so it can compile with timing enabled.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-10 12:56:38 +06:00
Artem Polyakov
482d7c9322 opal/timing: remove RML timings
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2017-04-07 21:16:21 +06:00
Artem Polyakov
79100de014 opal/timing: Remove oob tracing
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2017-04-07 21:16:21 +06:00
Ralph Castain
b33b4607df Correctly identify the source of the event when notifying of abnormal termination by a process
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-06 20:50:38 -07:00
Ralph Castain
a29ca2bb0d Enable slurm operations on Cray with constraints
Cleanup some errors in the nidmap code that caused us to send unnecessary topologies

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-06 08:58:06 -07:00
Ralph Castain
40ca43e157 Set the PARENT vpid for direct routed module
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-04 19:03:28 -07:00
Ralph Castain
9cb18b8348 Merge pull request #3280 from rhc54/topic/dvm
Fix the DVM by ensuring that all nodes, even those that didn't partic…
2017-04-04 18:15:33 -07:00
Ralph Castain
74863a0ea4 Fix the DVM by ensuring that all nodes, even those that didn't participate (i.e., didn't have any local children) in a job, clean up all resources associated with that job upon its completion. With the advent of backend distributed mapping, nodes that weren't part of the job would still allocate resources on other nodes - and then start from that point when mapping the next job. This change ensures that all daemons start from the same point each time.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-04 17:31:38 -07:00
Nathaniel Graham
7063f3021f Merge pull request #3231 from nrgraham23/revamp_mpirun_help
mpirun --help output revamp
2017-04-04 12:32:20 -06:00
Nathaniel Graham
19e5d15491 mpirun --help output revamp
This commit modifies the output from the mpirun --help
command.  The options have been split into groups, to
make the output smaller and more readable.  The groups
are: general, debug, output, input, mapping, ranking,
binding, devel, compatibility, launch, dvm, and
unsupported. There is also a special "full" command
that can be used to get the old behaviour of printing
out all of the options.  Unsupported options may only
be seen with this full output.

This commit also adds a special case for the help
argument.  It makes it possible for the user to
enter 0 or 1 arguments instead of having to always
enter an argument.  This defaults to printing out
the "general" help options so the user can then
see what help arguments there are.

Signed-off-by: Nathaniel Graham <ngraham@lanl.gov>
2017-04-04 10:59:32 -06:00
Ralph Castain
393c4536eb Remove stale code line
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-04 08:13:15 -07:00
Ralph Castain
92c996487c Update how we pass the node regex so we pass _all_ nodes, even those without daemons. This allows the backend daemons to form a complete picture of the allocation. Include info on which nodes have daemons on them, and populate that info on the backend as well.
Set the daemons' state to "running" and mark them as "alive" by default when constructing the nidmap

Get the DVM running again

Fix direct modex by eliminating race condition caused by releasing data while sending it

Up the size limit before compressing

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-03 19:25:15 -07:00
Ralph Castain
583dbe954c Silence coverity dead-code warning
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-26 20:36:43 -07:00
Ralph Castain
ecc8000136 Silence a flood of warnings when compiling with gcc on Cray
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-24 13:37:11 -06:00
Ralph Castain
35f817911e Fix coverity issues
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-24 08:09:46 -07:00
Ralph Castain
ea84a53faa Merge pull request #3218 from rhc54/topic/pmix2
Update to include the PMIx 2.0 APIs for monitoring and job control.
2017-03-21 20:11:10 -07:00
Ralph Castain
d645557fa0 Update to include the PMIx 2.0 APIs for monitoring and job control. Include required integration, but leave the monitors off for now. Move the sensor framework out of ORTE as it is being absorbed into PMIx
Fix typo and silence warnings

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-21 17:47:08 -07:00
Ralph Castain
10d401b6ec Merge pull request #3217 from rhc54/topic/wdirs
Resolve a race condition for setting our working directory when fork/exec'ing application procs.
2017-03-21 17:39:54 -07:00
Ralph Castain
74fd2c30af Cleanup alps odls module
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-21 17:41:11 -06:00
Ralph Castain
f8e1e3bed3 Ensure we properly exit with error if we cannot map the job
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-21 15:15:32 -07:00
Ralph Castain
75684dc260 Resolve a race condition for setting our working directory when fork/exec'ing application procs. We have to ensure we do it after the fork occurs since we want to use multiple threads in the odls. Otherwise, the different threads are bouncing the entire process around.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-21 13:54:03 -07:00
Howard Pritchard
9350aa5d71 orte/ras: remove loadleveler support
Remove loadleveler as it is obsolescent and is no longer supported.

Fixes #3167

We'll wait for final check of whether or not loadleveler even
compiles/functions before merging this.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-03-21 10:32:28 -06:00
Ralph Castain
dc85e7fde7 Provide a little more help on the error messages when an executable isn't found so we have some better idea where we were looking for it. Don't double-report such errors. Ensure the ORTE_ERROR_NAME doesn't get a NULL back for the string name of an error code as that might cause some systems to segfault
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-17 09:54:37 -07:00
Howard Pritchard
1709febdea Merge pull request #3166 from hppritcha/topic/swat_state_orted_comp_warning
ORTED: swat another compiler warning
2017-03-15 08:40:59 -06:00
Ralph Castain
96d7d10c1d Merge pull request #3170 from rhc54/topic/reg
Ensure the backend daemons know if we are in a managed allocation and if the HNP was included in the allocation
2017-03-14 12:48:09 -07:00
Ralph Castain
61a71e25ef Ensure the backend daemons know if we are in a managed allocation and if
the HNP was included in the allocation

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-14 10:06:43 -07:00
Howard Pritchard
5daaf7f3fd ORTED: swat another compiler warning
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-03-14 08:41:51 -06:00
Ralph Castain
52c9e631de Silence Coverity warnings
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-14 07:30:42 -07:00
Ralph Castain
b1a01d77ae Update the TM module to support regex passing
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-13 21:50:40 -07:00
Ralph Castain
bb574a41df Update launchers to get correct regex
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-13 11:21:44 -07:00
Ralph Castain
105fb152e1 Silence Coverity warnings
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-13 08:38:51 -07:00
Ralph Castain
b9f5cab710 Add a minor debug statement
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-12 18:15:44 -07:00
Gilles Gouaillardet
23d44a5284 sensor/base: initialize orte_sensor_base global variable
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-03-13 09:39:43 +09:00
Ralph Castain
6d6bc9bd07 Update alps module to new APIs
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-12 09:43:07 -07:00
Ralph Castain
70591bf4dc Enable parallel fork/exec of local procs by providing the option of multiple odls progress threads
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-11 20:48:04 -08:00
Ralph Castain
ab50665222 Restore sensor framework
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-11 17:46:32 -08:00
Ralph Castain
c6bc3ccb76 Sync to latest PMIx master and PMIx reference server
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-11 12:50:38 -08:00
Howard Pritchard
f8183f71f7 rmaps/base: swat compiler warning
gcc was complaining about variables possibly used uninitialized

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-03-09 14:30:06 -06:00
Ralph Castain
48fc339718 Create an alternative mapping method that pushes responsibility
onto the backend daemons. By default, let mpirun only pack the app_context
info and send that to the backend daemons where the mapping will
be done. This significantly reduces the computational time on mpirun as it isn't
running up/down the topology tree computing thousands of binding
locations, and it reduces the launch message to a very small number of
bytes.

When running -novm, fall back to the old way of doing things
where mpirun computes the entire map and binding, and then sends
the full info to the backend daemon.

Add a new cmd line option/mca param --fwd-mpirun-port that allows
mpirun to dynamically select a port, but then passes that back to
all the other daemons so they will use that port as a static port
for their own wireup. In this mode, we no longer "phone home" directly
to mpirun, but instead use the static port to wireup at daemon
start. We then use the routing tree to rollup the initial
launch report, and limit the number of open sockets on mpirun's node.

Update ras simulator to track the new nidmap code

Cleanup some bugs in the nidmap regex code, and enhance the error message for not enough slots to include the host on which the problem is found.

Update gadget platform file

Initialize the range count when starting a new range

Fix the no-np case in managed allocation

Ensure DVM node usage gets cleaned up after each job

Update scaling.pl script to use --fwd-mpirun-port. Pre-connect the daemon to its parent during launch while we are otherwise waiting for the daemon's children to send their "phone home" rollup messages

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-07 20:43:12 -08:00
Ralph Castain
83199979ba Remove the stale opal/sec framework
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-02 15:41:56 -08:00
Ralph Castain
c757c3d260 Fix double-free in rml/ofi shutdown
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-01 11:53:46 -08:00
Jeff Squyres
fec519a793 hwloc: rename opal/mca/hwloc/hwloc.h -> hwloc-internal.h
Per a prior commit, the presence of "hwloc.h" can cause ambiguity when
using --with-hwloc=external (i.e., whether to include
opal/mca/hwloc/hwloc.h or whether to include the system-installed
hwloc.h).

This commit:

1. Renames opal/mca/hwloc/hwloc.h to hwloc-internal.h.
2. Adds opal/mca/hwloc/autogen.options to tell autogen.pl to expect to
   find hwloc-internal.h (instead of hwloc.h) in opal/mca/hwloc.
3. s@opal/mca/hwloc/hwloc.h@opal/mca/hwloc/hwloc-internal.h@g in the
   rest of the code base.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-02-28 07:48:42 -08:00
Ralph Castain
f054261590 Merge pull request #3027 from naughtont3/tjn-envvar-dvmuri
dvm: Add envvar 'ORTE_HNP_DVM_URI' to schizo:ompi
2017-02-27 06:56:44 -08:00
Ralph Castain
efc3a98ea6 Merge pull request #3031 from rhc54/topic/ofi
Add CPPFLAGS to build of rml/ofi component.
2017-02-25 11:23:03 -08:00
Ralph Castain
9f8f7f3189 Add CPPFLAGS to build of rml/ofi component.
Fix finalize to ensure we only destruct the msg queue list once.
Update platform file

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-25 09:17:41 -08:00
Thomas Naughton
006be92df5 dvm: Add envvar 'ORTE_HNP_DVM_URI' to schizo:ompi
Add ability to pass DVM URI purely via environment
to simplify invocation from command-line (e.g., start dvm,
export URI, mpirun w/o needing to add `--hnp` arg).
If user passes both envvar *and* cmdline, the cmdline wins.

Signed-off-by: Thomas Naughton <naughtont@ornl.gov>
2017-02-24 16:55:32 -05:00
Thomas Naughton
beb5b250bf orte dvm: debug fix for DVM early quit
Ensure that job errors do not cause the DVM to fail unless the failed job is the DVM itself.

Refs #2987, with improvements from Ralph

Signed-off-by: Thomas Naughton <naughtont@ornl.gov>
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-23 10:17:53 -05:00
Ralph Castain
22c88f5ab5 Fix launch_id matching of -hosts
Need to check the entire value instead of just the last N digits. Otherwise, "-host 15" will match nid0015, nid0115, and any other launch id ending in 15

It appears strtol can return either a NULL or a zero-length string, so check for both cases

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-20 07:03:53 -08:00
Ralph Castain
af7e2cc33b Merge pull request #3004 from jjhursey/topic/oob-tcp-timeout
oob/tcp: Adjust TCP keepalive default values
2017-02-19 14:28:01 -08:00
Ralph Castain
95bfc7b7c6 Merge pull request #2991 from jjhursey/fix/ibm/errmgr-help-msg
orte/errmgr: Improve help message on connection lost
2017-02-17 11:34:18 -08:00
Joshua Hursey
df0f8e95cd oob/tcp: Adjust TCP keepalive default values
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-02-17 11:02:25 -06:00
Howard Pritchard
b272f87926 Merge pull request #2968 from hjelmn/pmix_cray
pmix/cray: performance improvements and cleanup
2017-02-16 11:41:59 -07:00
Ralph Castain
0ae873de5c Fix a bug where we failed to compute #procs for nperXXX directives, thus resulting in an incorrect default binding
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-15 16:32:24 -08:00
Ralph Castain
223495325d Fix binding policy bug and support pe=1 modifier
Allow someone to specify the "pe=N" modifier to a mapping policy when N=1. This equates to just "bind-to core", but helps people who use a script to set the PE policy. Fix a bug where setting the binding policy left a lingering "if-supported" flag that shouldn't be there.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-15 14:55:17 -08:00
Joshua Hursey
c452f68495 orte/errmgr: Improve help message on connection lost
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-02-15 16:36:00 -05:00
Ralph Castain
68b53e2179 Fix comm_spawn by registering nspace info only when needed - either when we have local procs, or when job-level info is required by connecting jobs
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-14 19:47:56 -08:00
Josh Hursey
a17b547430 Merge pull request #2957 from jjhursey/topic/ibm/rsh-sigint-fix
plm/rsh: Fix signal handling for rsh launcher
2017-02-14 15:29:00 -06:00
Nathan Hjelm
1df6bdd30e schizo/alps: set orte_bound_at_launch when launched with aprun
Set the orte_bound_at_launch MCA variable. This resolves a launch
performance bug when using aprun to launch Open MPI processes. If
this variable is not set it can take minutes longer to launch with
high ppn.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-02-14 11:13:48 -07:00
Joshua Hursey
843fcca03c plm/rsh: Fix signal handling for rsh launcher
* Similar to the other launchers (i.e., slurm, alps) we need to put the
   children in a separate process group to prevent SIGINT (from a CTRL-C)
   from being delivered to the whole process group and prematurely
   killing the rsh/ssh connections to the remote daemons.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-02-14 08:54:17 -06:00
Ralph Castain
dee2d8646d Fix plm/rsh runtime check
Fix the check for rsh/ssh so we allow the check for SGE and LoadLeveler to occur if user doesn't specify their own launch agent. Fix a Coverity warning

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-13 16:54:03 -08:00
Nathan Hjelm
2c1980ae39 Merge pull request #2923 from hjelmn/oob_fix
oob/tcp: cleanup peers before event bases
2017-02-06 09:34:10 -07:00
Nathan Hjelm
b928a6b9ea ras/slurm: fix compile error due to missing header
On some systems this component fails to build due to the missing
netdb.h header.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-02-03 15:22:34 -07:00
Nathan Hjelm
1c4b735f5f oob/tcp: cleanup peers before event bases
This commit fixes an error in teardown where the event bases are town
down before the peer structures are released. This causes us to call
event_del on an invalid event base. At best this makes valgrind
complain and at worst this causes aborts or segvs.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-02-03 15:18:41 -07:00
Ralph Castain
b661275dba For performance, try to send the oob/tcp message a few times before dropping back into the event library
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-02 06:44:15 -08:00
Ralph Castain
50ca9fb66b Merge pull request #2893 from rhc54/topic/sim
Cleanup the ras simulator capability, and the relay route thru grpcomm
2017-02-01 16:17:40 -08:00
Ralph Castain
230d15f0d9 Cleanup the ras simulator capability, and the relay route thru grpcomm
direct. Don't resend wireup info if nothing has changed

Fix release of buffer

Correct the unpacking order

Fix the DVM - now minimized data transfer to it

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-01 15:01:58 -08:00
Ralph Castain
8bf3ac828c Correct the path to the ORTE data dir - allows master to be built with --no-ompi
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-01 07:30:18 -07:00
Howard Pritchard
db4039f565 ess/alps: fix problem in makefile
./autogen.pl --no-ompi doesn't work without this
fix when alps can be configured.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-01-31 21:56:16 -06:00
Ralph Castain
b59ae14a2a Fix static port and partial allocation operations
Fix static port wireup by recording the TCP port mpirun is using and correctly passing the regex of hosts to the daemons. Do a better job of closing sockets on failed connection attempts. Correctly identify the remote host in the associated error message.

Fix partial allocation operations by not attempting to set #slots on nodes that were not used, and thus don't have a daemon or topology assigned to them

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-28 10:09:44 -08:00
Ralph Castain
c803af5d3d Minor change to allow qrsh to tree spawn, if supported
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-27 16:34:08 -08:00
Ralph Castain
7c795f4416 If the HNP is going to request topology info, it cannot do so via a routed OOB message as the intervening daemons may not be ready. So disable routing until the VM is ready, and have daemons start routing as they receive the xcast launch msg (which includes the data they need to talk to their peers).
Do a little optimization and minimize recomputation of the routing plan.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-27 15:37:16 -08:00
Ralph Castain
d672fad849 Repair rsh/ssh tree spawn
Repair rsh/ssh tree spawn by unpacking and updating the nidmap in remote_spawn.

Add more specific error messages so the cause of a messaging problem is a little clearer. Remove some stale code. Ensure we stop trying to send a message after a few times.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-27 11:35:00 -08:00
Josh Hursey
2e64bf42fb Merge pull request #2810 from jjhursey/fix/ibm/stdiag-to-stdout
Extend options for stddiag routing
2017-01-26 14:29:16 -06:00
Nathan Hjelm
fe1c6bd881 Merge pull request #2840 from hjelmn/event_fix
verbs: remove extra event user increment/decrement operation
2017-01-26 07:30:24 -08:00
Ralph Castain
399de0738e Cleanup launch
Given that we only set OOB contact info from inside of events, or before we begin threaded operations (e.g., in the ess), allow set_contact_info to directly update the oob/base framework globals.

Correct the nidmap regex decompression routine.

Ensure that rank=1 daemon always sends back its topology as this is the most common use-case.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-25 22:06:09 -08:00
Nathan Hjelm
9f28c0af39 verbs: remove extra event user increment/decrement operation
Since the oob and connections systems do not work the same way they
did in older versions of Open MPI these operations are no longer
necessary. At best they do nothing and at worst they hurt performance
by making us enter the event library more often in opal_progress().

Fixes #2839

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-01-25 18:37:06 -07:00
Ralph Castain
2f4e87eae9 Have rank=1 daemon always send its topology back as this is the most common use-case
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-25 09:33:11 -08:00
Jeff Squyres
230bbc597d plm base: make sure to assign "node" early enough
Make sure to assign "node" before using it in ORTE_FLAG_SET.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-01-25 08:02:59 -08:00
Ralph Castain
184ccc8e91 Cleanup some code so it is clear that it is executing in an event. Ensure that peer event base is properly set on incoming connections
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-25 06:55:11 -08:00
Gilles Gouaillardet
ef10d3fd7b orte: add missing include file
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-25 16:15:20 +09:00
Joshua Hursey
0e9a06d2c3 orte/iof: Add app stderr to stdout redirection at source
* Add an MCA parameter to combine stdout and stderr at the source
   - `iof_base_redirect_app_stderr_to_stdout`
 * Aids in user debugging when using libraries that mix stderr with stdout

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-01-24 16:23:48 -06:00
Joshua Hursey
dcd9801f7c orte/iof: Add orte_map_stddiag_to_stdout option
* Similar to `orte_map_stddiag_to_stderr` except it redirects `stddiag`
   to `stdout` instead of `stderr`.
 * Add protection so that the user canot supply both:
   - `orte_map_stddiag_to_stderr`
   - `orte_map_stddiag_to_stdout`

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-01-24 16:22:59 -06:00
Ralph Castain
ef86707fbe Deprecate the --slot-list paramaeter in favor of --cpu-list. Remove the --cpu-set param (mark it as deprecated) and use --cpu-list instead as it was confusing having the two params. The --cpu-list param defines the cpus to be used by procs of this job, and the binding policy will be overlayed on top of it.
Note: since the discovered cpus are filtered against this list, #slots will be set to the #cpus in the list if no slot values are given in a -host or -hostname specification.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-24 13:33:22 -08:00
Ralph Castain
4e9364b9a4 Merge pull request #2794 from rhc54/topic/regs
Next step in reducing launch time
2017-01-24 03:19:57 -08:00
Ralph Castain
86ab751c5e Next step in reducing launch time: begin reducing the size of the launch message itself. Start by expressing the daemon map as a set of three regular expression strings. On an 8k cluster, this reduces the nidmap contribution from over 200kBytes to 21 bytes in size.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-23 19:54:47 -08:00
Gilles Gouaillardet
0bdc594b2e rml/base: plug a memory leak in orte_rml_API_recv_cancel()
simply return when the orte event thread has gone

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-24 09:12:47 +09:00
Ralph Castain
a61f7bdb26 Merge pull request #2780 from rhc54/topic/conn
Ensure we properly set the "shutting down" flag so connection drops by downstream peers are properly handled.
2017-01-23 06:40:28 -08:00
Ralph Castain
e7b12913b4 Ensure we properly set the "shutting down" flag so connection drops by downstream peers are properly handled.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-23 04:00:24 -08:00
Nathan Hjelm
954a4b7be3 oob/base: fix num_threads registration type
This commit fixes a bug in the registration of the num_threads MCA
variable. The variable is of type int and was being registered as
a boolean.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-01-22 14:02:34 -07:00
Ralph Castain
ac4fcd3f97 Ensure that oob/base level data is always accessed in the oob/base event thread. Make debruijn the default routed component
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-22 10:33:32 -08:00
Ralph Castain
6560617c04 Fix comm_spawn and orte-dvm by resetting all used "node mapped" flags after building the child list
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-22 05:55:53 -08:00
Ralph Castain
639cdd4f9d Add missing flag set to ensure nodes do not get double-added to job map.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-21 20:06:50 -08:00
Ralph Castain
be3ef77739 Improve packing efficiency by raising the initial buffer size and modifying the extension code. Flag if a job map has had its nodes added so we don't have to loop repeatedly to check it.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-21 14:03:19 -08:00
Ralph Castain
466cbd4d29 Rework the threading in oob/tcp so that daemons (including mpirun) use multiple progress threads to get messages out to their children, and so that the oob/base uses a separate one to setup sends. This allows the daemon cmd processor to execute in parallel with relay of messages, which significantly reduces launch times at scale
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-21 13:26:19 -08:00
Ralph Castain
668421b6ec Compress the xcast message if bigger than a defined size to further improve launch performance at scale
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-19 22:08:02 -08:00