Gilles Gouaillardet
57b4144e57
orte: use compression for ORTE_DAEMON_REPORT_TOPOLOGY_CMD answer
...
Refs open-mpi/ompi#3414
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-04-27 17:21:59 +09:00
Gilles Gouaillardet
49cd40b2df
compress the topology sent by the first orted
...
Refs open-mpi/ompi#3414
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-04-27 16:20:11 +09:00
Gilles Gouaillardet
c38ef3d46f
oob/tcp: fix short writev handling in send_msg()
...
Fixes open-mpi/ompi#3414
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-04-27 10:24:38 +09:00
Howard Pritchard
462342d148
Merge pull request #3311 from hppritcha/topic/libfabric_moves_to_ofi
...
common/libfabric: move libfabric to ofi
2017-04-21 07:50:38 -06:00
Howard Pritchard
841192645b
common/libfabric: move libfabric to ofi
...
This PR renames the common library for OFI libfabric from
libfabric to ofi. There are a number of reasons this
is good to do:
1) its shorter and replaces 9 characters with three for
function names for what may eventually be a fairly extensive interface
2) OFI is the term used for MTL and RML components that use
the OFI libfabric interface
3) A planned OSC component will also use the OFI term.
4) Other HPC libraries that can use OFI libfabric tend to use
the term "ofi" internally and also in their configure options
relevant to OFI libfabric (i.e. MPICH/CH4, Intel MPI, Sandia SHMEM)
There seem to be comments in places in the Open MPI source
code that indicate that this common library will be going away.
Far from it as we will want to be able to share things like
AV objects between OMPI and possibly OSHMEM components that
use the OFI libfabric interface.
This PR also adds a synonym to the --with-libfabric(-libdir)
configury options: --with-ofi and with-ofi-libdir.
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-04-20 13:07:16 -06:00
Nathaniel Graham
34b4aeb17f
Merge pull request #3339 from nrgraham23/mpirun_help_improvements
...
Additional mpirun --help changes
2017-04-19 14:05:07 -06:00
Nathaniel Graham
01312b2f90
Additional mpirun --help changes
...
This commit recategorizes several mpirun arguments,
and moves the information for mpirun --help arguments
to the bottom of the general help message. I also
added the OPAL_CMD_LINE_OTYPE field to two commands
that were missed initially because they were not
in the same area as the others.
Signed-off-by: Nathaniel Graham <ngraham@lanl.gov>
2017-04-19 11:43:45 -06:00
Howard Pritchard
3918b7a796
Merge pull request #3213 from hppritcha/topic/remove_loadleveer
...
orte/ras: remove loadleveler support
2017-04-18 09:18:54 -06:00
Ralph Castain
bb1aaa3286
Use the node index to compare to daemon vpid when identifying procs to bind
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-14 02:37:25 -07:00
Ralph Castain
67156556ce
On behalf of Josh, ensure we flag that the child is no longer alive since we are killing it with SIGKILL
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-13 21:07:26 -07:00
Ralph Castain
1585854335
Minor coverity cleanups
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-12 19:31:35 -07:00
Ralph Castain
0500cc1c66
Update the debugger launch code to reflect the new backend mapping method.
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-12 13:31:18 -07:00
Ralph Castain
539f71d0cc
Merge pull request #3310 from marksantcroos/fix/alps_wdir
...
Bring ALPS ODLS up to par regarding wdir.
2017-04-11 17:30:04 -07:00
Mark Santcroos
27fa8aabd6
Hardcode basename to "orted" for error reporting.
...
Signed-off-by: Mark Santcroos <mark.santcroos@rutgers.edu>
2017-04-11 18:59:23 -04:00
Mark Santcroos
af3a6e1a29
Verify that the chdir(2) succeeds.
...
Signed-off-by: Mark Santcroos <mark.santcroos@rutgers.edu>
2017-04-12 00:37:37 +02:00
Ralph Castain
97e38e6d84
Move a free to a little later in case the verbose output needs it
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-11 11:21:12 -07:00
Mark Santcroos
36ac54b5d8
Bring ALPS ODLS up to par regarding wdir.
...
Signed-off-by: Mark Santcroos <mark.santcroos@rutgers.edu>
2017-04-10 08:15:07 -04:00
Ralph Castain
95ae0d1df3
Cleanup timing macros for portability across compilers. Rename the --enable-timing configure option to be --enable-pmix-timing so it doesn't pickup external timing requests. Remove a stale function reference in PMIx so it can compile with timing enabled.
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-10 12:56:38 +06:00
Artem Polyakov
482d7c9322
opal/timing: remove RML timings
...
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2017-04-07 21:16:21 +06:00
Artem Polyakov
79100de014
opal/timing: Remove oob tracing
...
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2017-04-07 21:16:21 +06:00
Ralph Castain
b33b4607df
Correctly identify the source of the event when notifying of abnormal termination by a process
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-06 20:50:38 -07:00
Ralph Castain
a29ca2bb0d
Enable slurm operations on Cray with constraints
...
Cleanup some errors in the nidmap code that caused us to send unnecessary topologies
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-06 08:58:06 -07:00
Ralph Castain
40ca43e157
Set the PARENT vpid for direct routed module
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-04 19:03:28 -07:00
Ralph Castain
9cb18b8348
Merge pull request #3280 from rhc54/topic/dvm
...
Fix the DVM by ensuring that all nodes, even those that didn't partic…
2017-04-04 18:15:33 -07:00
Ralph Castain
74863a0ea4
Fix the DVM by ensuring that all nodes, even those that didn't participate (i.e., didn't have any local children) in a job, clean up all resources associated with that job upon its completion. With the advent of backend distributed mapping, nodes that weren't part of the job would still allocate resources on other nodes - and then start from that point when mapping the next job. This change ensures that all daemons start from the same point each time.
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-04 17:31:38 -07:00
Nathaniel Graham
7063f3021f
Merge pull request #3231 from nrgraham23/revamp_mpirun_help
...
mpirun --help output revamp
2017-04-04 12:32:20 -06:00
Nathaniel Graham
19e5d15491
mpirun --help output revamp
...
This commit modifies the output from the mpirun --help
command. The options have been split into groups, to
make the output smaller and more readable. The groups
are: general, debug, output, input, mapping, ranking,
binding, devel, compatibility, launch, dvm, and
unsupported. There is also a special "full" command
that can be used to get the old behaviour of printing
out all of the options. Unsupported options may only
be seen with this full output.
This commit also adds a special case for the help
argument. It makes it possible for the user to
enter 0 or 1 arguments instead of having to always
enter an argument. This defaults to printing out
the "general" help options so the user can then
see what help arguments there are.
Signed-off-by: Nathaniel Graham <ngraham@lanl.gov>
2017-04-04 10:59:32 -06:00
Ralph Castain
393c4536eb
Remove stale code line
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-04 08:13:15 -07:00
Ralph Castain
92c996487c
Update how we pass the node regex so we pass _all_ nodes, even those without daemons. This allows the backend daemons to form a complete picture of the allocation. Include info on which nodes have daemons on them, and populate that info on the backend as well.
...
Set the daemons' state to "running" and mark them as "alive" by default when constructing the nidmap
Get the DVM running again
Fix direct modex by eliminating race condition caused by releasing data while sending it
Up the size limit before compressing
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-03 19:25:15 -07:00
Ralph Castain
583dbe954c
Silence coverity dead-code warning
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-26 20:36:43 -07:00
Ralph Castain
ecc8000136
Silence a flood of warnings when compiling with gcc on Cray
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-24 13:37:11 -06:00
Ralph Castain
35f817911e
Fix coverity issues
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-24 08:09:46 -07:00
Ralph Castain
ea84a53faa
Merge pull request #3218 from rhc54/topic/pmix2
...
Update to include the PMIx 2.0 APIs for monitoring and job control.
2017-03-21 20:11:10 -07:00
Ralph Castain
d645557fa0
Update to include the PMIx 2.0 APIs for monitoring and job control. Include required integration, but leave the monitors off for now. Move the sensor framework out of ORTE as it is being absorbed into PMIx
...
Fix typo and silence warnings
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-21 17:47:08 -07:00
Ralph Castain
10d401b6ec
Merge pull request #3217 from rhc54/topic/wdirs
...
Resolve a race condition for setting our working directory when fork/exec'ing application procs.
2017-03-21 17:39:54 -07:00
Ralph Castain
74fd2c30af
Cleanup alps odls module
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-21 17:41:11 -06:00
Ralph Castain
f8e1e3bed3
Ensure we properly exit with error if we cannot map the job
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-21 15:15:32 -07:00
Ralph Castain
75684dc260
Resolve a race condition for setting our working directory when fork/exec'ing application procs. We have to ensure we do it after the fork occurs since we want to use multiple threads in the odls. Otherwise, the different threads are bouncing the entire process around.
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-21 13:54:03 -07:00
Howard Pritchard
9350aa5d71
orte/ras: remove loadleveler support
...
Remove loadleveler as it is obsolescent and is no longer supported.
Fixes #3167
We'll wait for final check of whether or not loadleveler even
compiles/functions before merging this.
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-03-21 10:32:28 -06:00
Ralph Castain
dc85e7fde7
Provide a little more help on the error messages when an executable isn't found so we have some better idea where we were looking for it. Don't double-report such errors. Ensure the ORTE_ERROR_NAME doesn't get a NULL back for the string name of an error code as that might cause some systems to segfault
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-17 09:54:37 -07:00
Howard Pritchard
1709febdea
Merge pull request #3166 from hppritcha/topic/swat_state_orted_comp_warning
...
ORTED: swat another compiler warning
2017-03-15 08:40:59 -06:00
Ralph Castain
96d7d10c1d
Merge pull request #3170 from rhc54/topic/reg
...
Ensure the backend daemons know if we are in a managed allocation and if the HNP was included in the allocation
2017-03-14 12:48:09 -07:00
Ralph Castain
61a71e25ef
Ensure the backend daemons know if we are in a managed allocation and if
...
the HNP was included in the allocation
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-14 10:06:43 -07:00
Howard Pritchard
5daaf7f3fd
ORTED: swat another compiler warning
...
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-03-14 08:41:51 -06:00
Ralph Castain
52c9e631de
Silence Coverity warnings
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-14 07:30:42 -07:00
Ralph Castain
b1a01d77ae
Update the TM module to support regex passing
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-13 21:50:40 -07:00
Ralph Castain
bb574a41df
Update launchers to get correct regex
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-13 11:21:44 -07:00
Ralph Castain
105fb152e1
Silence Coverity warnings
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-13 08:38:51 -07:00
Ralph Castain
b9f5cab710
Add a minor debug statement
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-12 18:15:44 -07:00
Gilles Gouaillardet
23d44a5284
sensor/base: initialize orte_sensor_base global variable
...
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-03-13 09:39:43 +09:00