1
1
Граф коммитов

25966 Коммитов

Автор SHA1 Сообщение Дата
Gilles Gouaillardet
44a66e208c threads: fix WAIT_SYNC_INIT with a zero count
WAIT_SYNC_INIT(sync,0); WAIT_SYNC_RELEASE(sync);
hanged because sync->signaled was initialised to true, and
there is no reason to invoke WAIT_SYNC_SIGNALED(sync) before
WAIT_SYNC_RELEASE(sync)
this commit initializes sync->signaled to true unless the count is zero.

Thanks George for the review and guidance.
2016-09-07 10:03:40 +09:00
Jeff Squyres
722d5eecf1 orte proc_info.c: remove unused variable
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-09-06 16:38:15 -07:00
Nathan Hjelm
08d08e6c69 Merge pull request #2048 from hjelmn/pgi_asm
config: re-enable GCC inline ASM check for PGI
2016-09-06 16:24:15 -06:00
rhc54
39861ee987 Merge pull request #2058 from rhc54/topic/sync
Fix typo on the COLL_SYNC macro
2016-09-06 17:17:29 -05:00
Nathan Hjelm
27a2509fec Merge pull request #2051 from hjelmn/ppc_asm
opal/asm: updates to powerpc assembly
2016-09-06 15:13:28 -06:00
Ralph Castain
7f3fac48ab Fix typo on the COLL_SYNC macro 2016-09-06 12:43:07 -07:00
Josh Hursey
f6337f9eae Merge pull request #2047 from jjhursey/topic/mixed-host2
orte: !FQDN implementation to use opal_net_isaddr
2016-09-06 13:08:54 -05:00
rhc54
8b46118e87 Merge pull request #2057 from rhc54/topic/cid
Coverity fixes
2016-09-06 11:19:03 -05:00
Todd Kordenbrock
a17dff281d Merge pull request #1900 from PDeveze/mtl-portals4-short_msg-split_msg
Mtl portals4 short msg split msg
2016-09-06 11:14:19 -05:00
Ralph Castain
f85dcaee2a Fixes CID 1369067 and CID 1196684
Fixes CID 1369648

    Fixes CID 1372409
2016-09-06 08:43:15 -07:00
Jeff Squyres
527efec4fb Merge pull request #2050 from jsquyres/pr/btl-tcp-help-messages
Add a show_help message to TCP BTL when peer unexpectedly disconnects
2016-09-06 09:40:31 -04:00
Jeff Squyres
1953e3406f btl/tcp: add show_help message when peer hangs up
We commonly see messages on the users list where a peer has hung up
because it has crashed.  Instead of having just a BTL_ERROR message,
make this a real opal_show_help() message that tells the user that the
peer unexpectedly hung up, and they should look into *why* that peer
hung up.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-09-06 09:40:03 -04:00
Gilles Gouaillardet
894be7860a gcc_builtin/atomic: Silence numerous warnings from Studio compilers
This commit adds selective use of a compiler-specific pragma to
silence the numerous warnings the Sun/Oracle/Studio compilers emit for
the GNU-style inline asm used in atomic.h.

Thanks Paul Hargrove for the initial patch and the guidance.
2016-09-06 09:07:16 +09:00
Gilles Gouaillardet
7b39d9065c Merge pull request #2054 from ggouaillardet/topic/mca_btl_tcp_proc_insert
btl/tcp: make mca_btl_tcp_proc_insert re-entrant
2016-09-06 08:54:38 +09:00
Gilles Gouaillardet
91e1200c14 ompi/request: correctly handle zero count in ompi_request_default_wait_{all,any,some} 2016-09-05 17:19:30 +09:00
Gilles Gouaillardet
4b208e4463 btl/tcp: make mca_btl_tcp_proc_insert re-entrant
otherwise bad things happen with
 --mca btl_tcp_progress_thread 1 (non default)
and
 --mca mpi_add_procs_cutoff 0 (default)
2016-09-05 15:57:34 +09:00
Artem Polyakov
74a11d7832 Fix session dir cleanup code. 2016-09-05 07:53:55 +03:00
Artem Polyakov
dc0ab674de Add PMIx key to provide RM with ability to indicate that it will cleanup
session directories provided at through OPAL_PMIX_TMPDIR,
OPAL_PMIX_NSDIR, OPAL_PMIX_PROCDIR
2016-09-05 07:48:44 +03:00
Artem Polyakov
81195ab724 Several fixes related to session directories:
* enable OMPI to retrieve paths from RM through PMIx
* cleanups related to tempdirs.
2016-09-05 07:48:44 +03:00
Ralph Castain
fb51d65049 Minor change: check for NULL before using the job map to avoid segfault when erroring out prior to creating the map 2016-09-04 07:53:12 -07:00
Alex Mikheev
439456ae96 OSHMEM: spml ikrit: fixes zero copy
Allow mxm to use zero copy in put() and get() for the large messages.
2016-09-04 12:16:09 +03:00
Nathan Hjelm
a36bdfe69f opal/asm: updates to powerpc assembly
This commit contains the following changes:

 - There is a bug in the PGI 16.x betas for ppc64 that causes them to
   emit the incorrect instruction for loading 64-bit operands. If not
   cast to void * the operands are loaded with lwz (load word and
   zero) instead of ld. This does not affect optimized mode. The work
   around is to cast to void * and was implemented similar to a
   work-around for a xlc bug.

 - Actually implement 64-bit add/sub. These functions were missing and
   fell back to the less efficient compare-and-swap implementations.

Thanks to @PHHargrove for helping to track this down. With this update
the GCC inline assembly works as expected with pgi and ppc64.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-09-02 23:47:47 -06:00
Jeff Squyres
95c6f6cfc0 btl/tcp: fix help message
It looks like one help message was accidentally pasted in the middle
of another.  Disentangle the two messages from each other, and
slightly tweak the one message to say that the job may also crash (in
addition to hanging).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-09-02 17:14:22 -04:00
rhc54
9c496f767b Merge pull request #1602 from rhc54/topic/psm
Enable PSM to support dynamic processes
2016-09-02 14:41:19 -05:00
Nathan Hjelm
795833bfac config: re-enable GCC inline ASM check for PGI
We disabled this support a long time ago. Probably safe to assume
whatever bug we were working around no longer exists.

Closes open-mpi/ompi#2044

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-09-02 12:44:08 -06:00
Joshua Hursey
fe937d1e82 orte: !FQDN implementation to use opal_net_isaddr
* Switch to use opal_net_isaddr() for checking if a name is an IP
   address - as it is a bit cleaner, and uses common functionality.
2016-09-02 13:31:49 -05:00
Ralph Castain
4e0788e9ad Enable PSM to support dynamic processes
Fix comm_spawn to correctly reference the actual parent process that requested the spawn when looking for the parent job object
2016-09-02 10:22:04 -07:00
Nathan Hjelm
3274203f8a Merge pull request #2046 from hjelmn/ugni_fix
btl/ugni: fix erroneous warning message
2016-09-02 10:32:28 -06:00
Nathan Hjelm
f93c1f2106 btl/ugni: fix erroneous warning message
This commit prevents the connection code from trying to connect an
endpoint if the directed datagram has been posted but not received.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-09-02 09:17:44 -06:00
Ralph Castain
5c9ea565b6 Update NEWS 2016-09-01 18:10:21 -07:00
Ralph Castain
34f04a7924 Remove spurious Makefile.am line 2016-09-01 15:31:09 -07:00
Nathan Hjelm
1ce5847e8b osc/rdma: add support for network AMOs
This commit adds support for using network AMOs for MPI_Accumulate,
MPI_Fetch_and_op, and MPI_Compare_and_swap. This support is only
enabled if the ompi_single_intrinsic info key is specified or the
acc_single_interinsic MCA variable is set. This configuration
indicates to this implementation that no long accumulates will be
performed since these do not currently mix with the AMO
implementation.

This commit also cleans up the code somwhat. This includes removing
unnecessary struct keywords where the type is also typedef'd.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-09-01 15:47:33 -06:00
rhc54
fde6e6c6f8 Merge pull request #2043 from rhc54/topic/notifycomplete
Implement notification of completion on comm_spawn'd child jobs.
2016-09-01 16:42:30 -05:00
Ralph Castain
0ea1cff733 Implement notification of completion on comm_spawn'd child jobs. Add a configure flag to enable PMIx 3's shared memory datastore, and set it disable by default so that comm_spawn functions again. Will reverse the default once that feature is fully functional 2016-09-01 13:10:10 -07:00
Nathan Hjelm
43b2e3a844 Merge pull request #2041 from hjelmn/osc_pt2pt_fix
osc/pt2pt: do not use frag send to send lock request
2016-09-01 13:02:45 -06:00
Nathan Hjelm
cb1cb5ffed osc/pt2pt: do not use frag send to send lock request
This commit cleans up some code in the passive target path. The code
used the buffered frag control send path but it is more appropriate to
use the unbuffered one. This avoids checking structures that are
should not be in use in this path.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-09-01 09:57:27 -06:00
Gilles Gouaillardet
184d53a018 oshmem: swap fields of oshmem_proc_data_t to prevent padding
previously, the definition was

struct oshmem_proc_data_t {
    int num_transports;
    char * transport_ids;
};

so in 64 bits arch, the compiler would very likely insert a 4 bytes
padding before the two fields in order to have transport_ids aligned
2016-09-01 14:20:14 +09:00
Gilles Gouaillardet
0a25420dac oshmem: get rid of oshmem_proc_t and use ompi_proc_t instead
store oshmem related per proc data in an oshmem_proc_data_t struct,
that is stored in the padding section of an ompi_proc_t

this data can be accessed via the OSHMEM_PROC_DATA(proc) macro

Fixes open-mpi/ompi#2023
2016-09-01 14:20:14 +09:00
Gilles Gouaillardet
0b8c58298d oob/usock: fix handling of orte_process_name_t *
orte_process_name_t is aligned on 32 bits, so it cannot simply be casted
into an int64_t. use memcpy() instead

Thanks Paul Hargrove for the report
2016-09-01 13:18:02 +09:00
Gilles Gouaillardet
75b7ef97a0 coll/libnbc: fix nbc_ireduce when sendbuf == recvbuf
if sendbuf is equal to recvbuf, that should not be interpreted
as equivalent to MPI_IN_PLACE on the non root rank(s)

Thanks Valentin Petrov for the report
2016-09-01 10:19:05 +09:00
Gilles Gouaillardet
2969235324 libnbc: fix NBC_Copy for predefined datatypes
predefined datatypes such as MPI_LONG_DOUBLE_INT are not really contiguous,
so use span as returned by opal_datatype_span() instead of type extent,
otherwise data might be written above allocated memory.

Thanks Valentin Petrov for the report
2016-09-01 10:18:57 +09:00
Jeff Squyres
16fe18eb7c Merge pull request #2036 from edgargabriel/pr/datatype-refcount-fix
io/ompio: fix the reference count of basic datatypes used as etypes o…
2016-08-31 19:05:49 -04:00
Edgar Gabriel
be183cb3dd io/ompio: fix the reference count of basic datatypes used as etypes or ftypes. 2016-08-31 14:08:26 -05:00
rhc54
39d086e000 Merge pull request #2035 from rhc54/topic/memprofile
Provide a mechanism for obtaining memory profiles of daemons and application profiles for use in studying our memory footprint
2016-08-31 14:06:48 -05:00
Ralph Castain
39992d1ad7 Silence trivial Coverity warnings 2016-08-31 09:42:33 -07:00
Ralph Castain
c1050bc01e Provide a mechanism for obtaining memory profiles of daemons and application profiles for use in studying our memory footprint. Setting OMPI_MEMPROFILE=N causes mpirun to set a timer for N seconds. When the timer fires, mpirun will query each daemon in the job to report its own memory usage plus the average memory usage of its child processes. The Proportional Set Size (PSS) is used for this purpose. 2016-08-31 09:32:07 -07:00
Jeff Squyres
ead9b6389a README: update for new mailman and main web sites
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-08-31 09:55:36 -04:00
Jeff Squyres
c33aaa5604 Merge pull request #1997 from jsquyres/pr/make-cma-configury-better
opal_check_cma: make consistent with rest of configury
2016-08-31 09:50:24 -04:00
Jeff Squyres
ba0ed2401a Merge pull request #2031 from jsquyres/pr/fix-fortran-runpath-detection
opal_setup_wrappers.m4: fix typo in Fortran rpath detection
2016-08-31 09:28:32 -04:00
rhc54
ed5846038b Merge pull request #2033 from rhc54/topic/state
Ensure that the "running" state is correctly updated
2016-08-31 01:50:38 -05:00