1
1
Граф коммитов

2969 Коммитов

Автор SHA1 Сообщение Дата
Gilles Gouaillardet
022cca79ea pmix/ext1x: plug a memory leak in opal_lkupcbfunc()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-30 10:52:36 +09:00
Gilles Gouaillardet
f485d12a82 pmix: rename the ext11 component into ext1x
also use the same naming scheme thann pmix/ext2x

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-30 10:52:35 +09:00
Gilles Gouaillardet
dccb1899e6 pmix/ext11: correctly use PMIx_server_register_nspace()
PMIx_server_register_nspace() is an asynchronous operation, so
the pmix glue wait for it completes before returning.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-30 09:23:19 +09:00
Gilles Gouaillardet
6955e1e25c pmix/ext11: fix compilation
the argc field from the opal_pmix_app_t struct was removed,
so adjust the pmix/ext11 glue accordingly.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-30 09:23:18 +09:00
Howard Pritchard
fca45a2742 mca help: fix typo found by user
Fix typo found by @pozdneev

Fixes #2821

bot:notest

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-01-28 09:37:43 -07:00
Ralph Castain
3302864a7d Cleanup a typo that can cause a segfault - use a local variable name different than the one passed into the function
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-27 16:49:25 -08:00
Josh Hursey
2e64bf42fb Merge pull request #2810 from jjhursey/fix/ibm/stdiag-to-stdout
Extend options for stddiag routing
2017-01-26 14:29:16 -06:00
Josh Hursey
770c41f493 Merge pull request #2807 from jjhursey/fix/ibm/event-external
libevent/external: Add opal_event_include to this component
2017-01-26 14:26:50 -06:00
Jeff Squyres
2c277a66fd Merge pull request #2772 from jjhursey/topic/stacktrace-improv
master: opal/stacktrace improvements
2017-01-26 10:48:41 -08:00
Joshua Hursey
6d98559be9 stacktrace: Add flexibility in stacktrace ouptut
- New MCA option: opal_stacktrace_output
   - Specifies where the stack trace output stream goes.
   - Accepts: none, stdout, stderr, file[:filename]
   - Default filename 'stacktrace'
     - Filename will be `stacktrace.PID`, or if VPID is available,
       then the filename will be `stacktrace.VPID.PID`
 - Update util/stacktrace to allow for different output avenues
   including files. Previously this was hardcoded to 'stderr'.
 - Since opal_backtrace_print needs to be signal safe, passing it a
   FILE object that actually represents a file stream is difficult. This
   is because we cannot open the file in the signal handler using
   `fopen` (not safe), but have to use `open` (safe). Additionally, we
   cannot use `fdopen` to convert the `int fd` to a `FILE *fh` since it
   is also not signal safe.
   - I did not want to break the backtrace.h API so I introduced a new
     rule (documented in `backtrace.c`) that if the `FILE *file`
     argument is `NULL` then look for the `opal_stacktrace_output_fileno`
     variable to tell you which file descriptor to use for output.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-01-26 11:55:32 -06:00
Nathan Hjelm
fe1c6bd881 Merge pull request #2840 from hjelmn/event_fix
verbs: remove extra event user increment/decrement operation
2017-01-26 07:30:24 -08:00
Gilles Gouaillardet
896434b1bd pmix/ext2x: plug a memory leak in opal_lkupcbfunc()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-26 14:07:15 +09:00
Gilles Gouaillardet
6b8e1c217c pmix/ext2x: plug misc memory leaks
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-26 14:06:58 +09:00
Nathan Hjelm
9f28c0af39 verbs: remove extra event user increment/decrement operation
Since the oob and connections systems do not work the same way they
did in older versions of Open MPI these operations are no longer
necessary. At best they do nothing and at worst they hurt performance
by making us enter the event library more often in opal_progress().

Fixes #2839

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-01-25 18:37:06 -07:00
Gilles Gouaillardet
142b95df87 pmix/ext2x: plug misc memory leaks regarding opal_pmix2x_event_chain_t handling
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-25 16:17:10 +09:00
Gilles Gouaillardet
7a3d39f079 pmix/ext2x: plug a memory leak in _reg_nspace()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-25 16:17:01 +09:00
Joshua Hursey
dcd9801f7c orte/iof: Add orte_map_stddiag_to_stdout option
* Similar to `orte_map_stddiag_to_stderr` except it redirects `stddiag`
   to `stdout` instead of `stderr`.
 * Add protection so that the user canot supply both:
   - `orte_map_stddiag_to_stderr`
   - `orte_map_stddiag_to_stdout`

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-01-24 16:22:59 -06:00
Joshua Hursey
d6b306d716 libevent/external: Add opal_event_include to this component
* Adds a parameter to adjust the method used by libevent.
   - Matches that of the libevent2022 component.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-01-24 16:03:09 -06:00
Ralph Castain
ef86707fbe Deprecate the --slot-list paramaeter in favor of --cpu-list. Remove the --cpu-set param (mark it as deprecated) and use --cpu-list instead as it was confusing having the two params. The --cpu-list param defines the cpus to be used by procs of this job, and the binding policy will be overlayed on top of it.
Note: since the discovered cpus are filtered against this list, #slots will be set to the #cpus in the list if no slot values are given in a -host or -hostname specification.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-24 13:33:22 -08:00
Josh Hursey
c6595c2289 Merge pull request #2792 from jjhursey/topic/libevent-conf2
libevent2022: Fix broken configure AC_LANG_PROGRAM
2017-01-24 08:31:46 -06:00
Gilles Gouaillardet
682f5116aa Merge pull request #2781 from ggouaillardet/topic/misc_fixes_and_plugs
fix misc bugs and plug misc memory leaks
2017-01-24 14:41:45 +09:00
Joshua Hursey
72ac812039 libevent2022: Fix broken configure AC_LANG_PROGRAM
* Similar to commit 029964a748
   This removes an extra `int main` during configure.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-01-23 21:47:59 -06:00
Gilles Gouaillardet
189da7fdab pmix2x: plug a memory leak in _event_hdlr()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-24 09:13:30 +09:00
Gilles Gouaillardet
acbc32d3b2 pmix2x: plug a memory leak in opal_lkupcbfunc()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-24 09:13:29 +09:00
Gilles Gouaillardet
b5b21043c4 pmix2x: plug a memory leak in _reg_nspace()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-24 09:13:29 +09:00
Gilles Gouaillardet
0f47310a75 pmix2x/pmix2x_client: plug misc memory leaks
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-24 09:13:29 +09:00
Joshua Hursey
029964a748 libevent2022: Fix broken configure AC_LANG_PROGRAM
* The AC_LANG_PROGRAM macro adds the `main()` so it is erroneous
   to add it to the test program.
 * This was detected with the XL compilers which will fail to
   build the program in this situation. The GNU compiler does not
   error out or warn, but successfully compiles the program.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-01-23 13:44:12 -06:00
Ralph Castain
8c960bae8d Update to latest PMIx master
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-23 07:07:40 -08:00
George Bosilca
999d4973a9
Fix an issue with extremely large data identified by tjb900.
Due to the conversion from ssize_t to int we were losing bytes, and
ended up writing outside the receiver buffer. Similarly on the send,
due to the conversion to a lesser type, we could missinterpret the
end of the fragment.
2017-01-18 10:33:12 -05:00
Nathan Hjelm
91c34c8df6 Merge pull request #2703 from hjelmn/rcache_fix
rcache/base: do not release vma stuctures in vma_tree_delete
2017-01-12 09:53:34 -07:00
Jeff Squyres
938ab01ad6 Merge pull request #2714 from hjelmn/timer_rollover
timer/linux: prevent 64-bit overflow
2017-01-12 06:40:52 -05:00
Nathan Hjelm
45c05880aa timer/linux: prevent 64-bit overflow
The linux timer code was multiplying the result of the x86 time stamp
counter by 1000000 before dividing by the cpu frequency. This can
cause us to overflow 64 bits if the time stamp counter grows larger
than ~ 1.8e13 (about 8400 seconds after boot). To fix the issue the
units of opal_timer_linux_freq have been changed to MHz.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-01-11 20:03:10 -07:00
Gilles Gouaillardet
aeee48357a btl/sm: correctly handle nodes with zero NUMA hwloc object
the hwloc topology might not contain a NUMA object with hwloc < v2
if the node is not NUMA, so force the NUMA object count to one
in order to correctly allocate mca_btl_sm_component.sm_mpools.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-12 11:45:29 +09:00
Ralph Castain
31a8476223 Merge pull request #2702 from rhc54/topic/cov
Silence Coverity CID 1398541
2017-01-10 17:50:23 -08:00
Nathan Hjelm
79cabc92fd rcache/base: do not release vma stuctures in vma_tree_delete
This commit fixes a deadlock that can occur when the libc version
holds a lock when calling munmap. In this case we could end up calling
free() from vma_tree_delete which would in turn try to obtain the lock
in libc. To avoid the issue put any deleted vma's in a new list on the
vma module and release them on the next call to vma_tree_insert. This
should be safe as this function is not called from the memory hooks.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-01-10 16:58:07 -07:00
Ralph Castain
e568b211e4 Silence Coverity CID 1398541
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-10 15:30:50 -08:00
Jeff Squyres
b980e334dc usnic: add completion stats
This should probably not go to the v2.x branch, since it changes the
output format of the usnic stats.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-01-10 12:06:54 -08:00
Jeff Squyres
706f53bb01 usnic: ensure that stats string is always truncated
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-01-10 12:06:54 -08:00
Jeff Squyres
1fdd0fe228 usnic: add missing params to show_help() call
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-01-10 12:06:54 -08:00
Jeff Squyres
7048adec04 usnic: add some assert()s
Add some run-time assert checks for debug builds.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-01-10 12:06:32 -08:00
Jeff Squyres
2d28ccb5fd usnic: add verbose output of queue lengths
Show the actual RX/TX and CQ length returned by libfabric in verbose
output.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-01-10 12:06:32 -08:00
Jeff Squyres
bd5b8ed754 usnic: ensure that queues are long enough
Double check the queue lengths that we get back from libfabric to
ensure that they are at least as long as we need.  They *should* never
be shorter than we need, but let's just check to be sure.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-01-10 12:06:32 -08:00
Jeff Squyres
53dc75a89c usnic: ensure to reset flags on returned frags
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-01-10 12:06:31 -08:00
Jeff Squyres
c4d7876ca0 usnic: check send credits on data channel for data frags
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-01-10 12:06:31 -08:00
Jeff Squyres
879d25e5df usnic: ensure to check send credits for ACKs
Don't just blindly send ACKs; ensure that we have send credits before
doing so.  If we don't have any send credits, just don't send the ACK
(it'll come again soon enough; it's not a tragedy if we don't send it
now).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-01-10 12:06:31 -08:00
Jeff Squyres
7787dad4db usnic: ensure CQs are long enough
The libfabric usnic provider may give you back TX/RX queues that are
longer than you asked for.  So just use the TX/RQ/CQ lengths that we
asked for, regardless of what length comes back.

Additionally, keep the length of the priority channel CQ separate from
the length of the data CQ.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-01-10 12:03:53 -08:00
Jeff Squyres
b02d8c48f5 usnic: make the releasing safer
Since the usnic BTL is single-threaded in this area, there really is
no danger, but don't use one of the pointers hanging off the frag
after we return it to the freelist.  Instead, save the endpoint
pointer before returning the frag.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-01-10 12:03:53 -08:00
Jeff Squyres
e25b860627 usnic: clarify types
The types are technically typedef equivalent, but it's less confusing
to use the types that agree with the name of the constructor.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-01-10 12:03:53 -08:00
Jeff Squyres
40fe575132 usnic: trivial updates (no code/logic changes)
- Add more explanatory comments
- Trivial whitespace / style updates
- Rename opal_btl_usnic_force_retrans() -> opal_btl_usnic_fast_retrans()

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-01-10 10:40:02 -08:00
Gilles Gouaillardet
6d59b476de Merge pull request #2686 from ggouaillardet/topic/pmix2x_ptl_base_sendrecv
pmix2x: ptl/base: send header and message data together via writev()
2017-01-10 16:26:10 +09:00