1
1
Граф коммитов

1285 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
cb55c88a8b Update to sync with OMPI master and cleanup to build
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-11-22 14:24:54 -08:00
Gilles Gouaillardet
023d18abae pml/ob1: mca_pml_ob1_recv must have memchecker mark the buffer as defined upon success
this is generally done in mca_pml_ob1_recv_request_free(), but this is not invoked
in via mca_pml_ob1_recv(), so do it manually

Thanks Yvan Fournier for the report

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-11-07 13:10:15 +09:00
Yossi Itigin
17c8f76411 pml_ucx: fix uninitialized field req_status->_cancelled.
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2016-11-01 17:02:22 +02:00
Gilles Gouaillardet
af67183e2f pml/v: fix a memory leak
close the framework if no more component should be used

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-10-28 09:32:30 +09:00
Gilles Gouaillardet
8e788b5aee pml/ob1: refactor append_recv_req_to_queue() to improve readability
and fix a typo in a comment

Thanks George for the patch
2016-10-25 10:50:40 +09:00
Gilles Gouaillardet
4a886ac4cc pml/ob1: correctly reset receive request type before init
recvreq->req_recv.req_base.req_type should always be set before invoking
MCA_PML_OB1_RECV_REQUEST_INIT(recvreq, ...) otherwise, the previous type
might be set, and you could end up with MPC_PML_REQUEST_IMPROBE when
MCA_PML_REQUEST_RECV is expected.

Thanks Chris Pattison for the report and test case.

Fixes open-mpi/ompi#2275
2016-10-24 16:50:23 +09:00
Yossi Itigin
05ca466c6b ucx: adapt pml_ucx and spml_ucx to new UCX APIs
- pass field_mask to ucp_init().
- use non-blocking disconnect.
- recv() with pre-allocated request.
- call opal_progress() from iprobe() and improbe().
- use shift pattern in connect/disconnect.
2016-10-12 23:45:45 +03:00
George Bosilca
295eec7059 Small fix for persistence receives.
A minor optimization, few typos and extra comments
2016-09-16 10:27:32 -04:00
Nathan Hjelm
7589a25377 osc/pt2pt: do not repost receive from request callback
This commit fixes an issue that can occur if a target gets overwhelmed with
requests. This can cause osc/pt2pt to go into deep recursion with a stack
like req_complete_cb -> ompi_osc_pt2pt_callback -> start -> req_complete_cb
-> ... . At small scale this is fine as the recursion depth stays small but
at larger scale we can quickly exhaust the stack processing frag requests.
To fix the issue the request callback now simply puts the request on a
list and returns. The osc/pt2pt progress function then handles the
processing and reposting of the request.

As part of this change osc/pt2pt can now post multiple fragment receive
requests per window. This should help prevent a target from being overwhelmed.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2016-08-11 15:33:07 -06:00
Gilles Gouaillardet
dfbf2b7be4 opal/threads: add OPAL_THREAD_SUB_SIZE_T macro
-1 is not a valid size_t, so instead of OPAL_THREAD_ADD_SIZE_T(..., -1),
simply OPAL_THREAD_SUB_SIZE_T(..., 1) and keep picky compilers happy
2016-08-10 13:37:36 +09:00
Nathan Hjelm
799104f688 Merge pull request #1947 from hjelmn/perf
pml/ob1: be more selective when using rdma capable btls
2016-08-09 22:15:09 -06:00
Nathan Hjelm
4079eec974 pml/ob1: be more selective when using rdma capable btls
This commit updates the btl selection logic for the RDMA and RDMA
pipeline protocols to use a btl iff: 1) the btl is also used for eager
messages (high exclusivity), or 2) no other RDMA btl is available on
an endpoint and the pml_ob1_use_all_rdma MCA variable is true. This
fixes a performance regression with shared memory when an RDMA capable
network is available.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-08-09 20:54:42 -06:00
Nathan Hjelm
889dd32806 pml/ob1: reset req_bytes_packed on start
On start we were not correctly resetting all request fields. This was
leading to a double-completion on persistent receives. This commit
updates the base start code to reset the receive req_bytes_packed and
the send request convertor.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-08-03 11:29:30 -06:00
Joshua Ladd
06930a0423 Merge pull request #1840 from artpol84/yalla_perf_fix
pml/yalla: fix yalla performance regression
2016-07-14 10:55:30 +03:00
George Bosilca
eac5b3c668 Various cleanups in the monitoring PML. 2016-07-05 18:31:25 +02:00
Artem Polyakov
a4ff9bef6d fix #2 2016-07-05 14:38:35 +03:00
Artem Polyakov
bc973cad30 fix 2016-07-05 14:33:31 +03:00
Artem Polyakov
7d96f12fec pml/yalla: fix yalla performance regression
It was introduced in PR https://github.com/open-mpi/ompi/pull/1228
 in particular in commit 041a6a9f53.

 Original solution was using "flexible array member" called "mxm_base"
 to "fall-through" to the "mxm" send/recv member that located in the
 outer structure.

 After changing number of elements in "mxm_base" from 0 to 1 we actually
 allocating 2 mxm_req_base_t elements which leads to increased overal
 size and harms cache performance.

 It also brakes "mca_pml_yalla_check_request_state" function.
2016-07-05 10:52:48 +03:00
Nathan Hjelm
5d0b4679ea pml/ob1: bug fixes
This commit fixes two bugs in pml/ob1:

 - Do not called MCA_PML_OB1_PROGRESS_PENDING from
   mca_pml_ob1_send_request_start_copy as this may lead to a recursive
   call to mca_pml_ob1_send_request_process_pending.

 - In mca_pml_ob1_send_request_start_rdma return the rdma frag object
   if a btl fragment can not be allocated. This fixes a leak
   identified by @abouteiller and @bosilca.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-06-06 17:54:55 -06:00
Nathan Hjelm
e968ddfe64 start bug fixes (#1729)
* mpi/start: fix bugs in cm and ob1 start functions

There were several problems with the implementation of start in Open
MPI:

 - There are no checks whatsoever on the state of the request(s)
   provided to MPI_Start/MPI_Start_all. It is erroneous to provide an
   active request to either of these calls. Since we are already
   looping over the provided requests there is little overhead in
   verifying that the request can be started.

 - Both ob1 and cm were always throwing away the request on the
   initial call to start and start_all with a particular
   request. Subsequent calls would see that the request was
   pml_complete and reuse it. This introduced a leak as the initial
   request was never freed. Since the only pml request that can
   be mpi complete but not pml complete is a buffered send the
   code to reallocate the request has been moved. To detect that
   a request is indeed mpi complete but not pml complete isend_init
   in both cm and ob1 now marks the new request as pml complete.

 - If a new request was needed the callbacks on the original request
   were not copied over to the new request. This can cause osc/pt2pt
   to hang as the incoming message callback is never called.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>

* osc/pt2pt: add request for gc after starting a new request

Starting a new receive may cause a recursive call into the pt2pt
frag receive function. If this happens and the prior request is
on the garbage collection list it could cause problems. This commit
moves the gc insert until after the new request has been posted.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2016-06-02 20:22:40 -04:00
George Bosilca
2e1b1d34c6 Safety first ! 2016-06-02 11:52:43 +09:00
Nathan Hjelm
086ffc1838 pml/ob1: fix race on pml completion of send requests
The request code was setting the request as pml_complete before
calling MCA_PML_OB1_SEND_REQUEST_MPI_COMPLETE. This was causing
MCA_PML_OB1_SEND_REQUEST_RETURN to be called twice in some cases. The
code now mirrors the recvreq code and only sets the request as pml
complete if the request has not already been freed.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-06-01 13:36:06 -06:00
Thananon Patinyasakdikul
60d0fbf683 Removal of ompi_request_lock from pml/ucx. 2016-05-26 12:36:58 -04:00
George Bosilca
90f294096e Remove more references to the request mutex.
Regarding BFO it should be mentionned that this component is currently
unmaintained, and that despite my efforts I could not make it compile
(it would not compile before this patch either).
2016-05-25 23:27:06 -04:00
Nathan Hjelm
9d439664f0 pml/yalla: update for request changes
This commit brings the pml/yalla component up to date with the request
rework changes.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-25 15:42:53 -06:00
Nathan Hjelm
8445c885ce pml/cm: update for request changes
This fixes a hang caused by the request refactor work. The cm pml was
not updated and was hanging is most cases.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-25 15:35:32 -06:00
bosilca
b90c83840f Refactor the request completion (#1422)
* Remodel the request.
Added the wait sync primitive and integrate it into the PML and MTL
infrastructure. The multi-threaded requests are now significantly
less heavy and less noisy (only the threads associated with completed
requests are signaled).

* Fix the condition to release the request.
2016-05-24 18:20:51 -05:00
Aurélien Bouteiller
7f65c2b18e forgot to update copyright in commits 627a89b 4899c89 2016-05-13 11:34:59 -04:00
George Bosilca
37e03e3e5b Don't update req_bytes_received if no bytes were received. 2016-05-12 23:39:32 -04:00
Aurélien Bouteiller
627a89bf71 Fix a race condition when multiple threads do the "first send" to an endpoint simultaneously. 2016-05-05 09:04:10 -04:00
Jeff Squyres
265e5b9795 Merge pull request #1552 from kmroz/wip-hostname-len-cleanup-1
ompi/opal/orte/oshmem/test: max hostname length cleanup
2016-05-02 09:44:18 -04:00
George Bosilca
bf190671e9 Make the request lock recursive.
If during the request completion callback we post another request that
completes right away (such a small send or a match for an unexpected
short message) we will try to complete the second request while holding
the lock for the completion of the first. For performance reasons
(mainly to avoid unlocking and locking the request mutex several times)
we have made the request lock recursive.
2016-04-26 16:16:07 -04:00
Nathan Hjelm
c16e639b2f Merge pull request #1563 from hjelmn/ompi_coverity
ompi coverity fixes
2016-04-26 09:17:48 -06:00
Karol Mroz
3322347da9 ompi: fixup hostname max length usage
Signed-off-by: Karol Mroz <mroz.karol@gmail.com>
2016-04-25 07:08:23 +02:00
Nathan Hjelm
ae0ffbb67f Merge pull request #1397 from hjelmn/enable_thread_multiple
ompi: always enable MPI_THREAD_MULTIPLE support
2016-04-23 08:40:22 -06:00
Nathan Hjelm
1ff3d3b16b pml/ob1: fix coverity issue
Fix CID 1357978 (1 of 1): Logically dead code (DEADCODE):

Remove duplicate check for NULL == endpoint.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-19 14:48:13 -06:00
George Bosilca
896f857fc4 Thanks @hjelmn for catching up the typo. 2016-04-07 13:56:26 -04:00
Thananon Patinyasakdikul
92290b94e0 Fixed Coverity reports 1358014-1358018 (DEADCODE and CHECK_RETURN) 2016-04-07 12:52:17 -04:00
Jeff Squyres
91c54d7a07 Merge pull request #1491 from ICLDisco/progress_thread
BTL TCP async progress
2016-03-29 06:26:10 -04:00
George Bosilca
f69eba1bc4 Update the copyright and cleanup the code.
Per @jsquyres suggestion remove all trailing spaces.
Credit to `sed -i.bak 's/ *$//' */[ch]`.
2016-03-28 14:41:01 -04:00
Thananon Patinyasakdikul
92062492b9 Enable Threading in the BTL TCP
Added mca parameter to turn progress thread on/off
Add a flag to check if we have btl progress thread.
Added macro for ob1 matching lock.
Update the AUTHORS file.
2016-03-28 14:41:01 -04:00
Nathan Hjelm
9d5eeecb8a pml/ob1: detect unreachable errors
This commit adds code to detect when procs are unreachable when using
the dynamic add_procs functionality.

Fixes #1501

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-03-28 10:52:40 -06:00
Nathan Hjelm
d4afb16f5a opal: rework mpool and rcache frameworks
This commit rewrites both the mpool and rcache frameworks. Summary of
changes:

 - Before this change a significant portion of the rcache
   functionality lived in mpool components. This meant that it was
   impossible to add a new memory pool to use with rdma networks
   (ugni, openib, etc) without duplicating the functionality of an
   existing mpool component. All the registration functionality has
   been removed from the mpool and placed in the rcache framework.

 - All registration cache mpools components (udreg, grdma, gpusm,
   rgpusm) have been changed to rcache components. rcaches are
   allocated and released in the same way mpool components were.

 - It is now valid to pass NULL as the resources argument when
   creating an rcache. At this time the gpusm and rgpusm components
   support this. All other rcache components require non-NULL
   resources.

 - A new mpool component has been added: hugepage. This component
   supports huge page allocations on linux.

 - Memory pools are now allocated using "hints". Each mpool component
   is queried with the hints and returns a priority. The current hints
   supported are NULL (uses posix_memalign/malloc), page_size=x (huge
   page mpool), and mpool=x.

 - The sm mpool has been moved to common/sm. This reflects that the sm
   mpool is specialized and not meant for any general
   allocations. This mpool may be moved back into the mpool framework
   if there is any objection.

 - The opal_free_list_init arguments have been updated. The unused0
   argument is not used to pass in the registration cache module. The
   mpool registration flags are now rcache registration flags.

 - All components have been updated to make use of the new framework
   interfaces.

As this commit makes significant changes to both the mpool and rcache
frameworks both versions have been bumped to 3.0.0.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-03-14 10:50:41 -06:00
Joshua Ladd
4dffae2f88 Fixing MXM Yalla and MTL add procs behavior. MXM cannot support dynamic add procs, so propaget this info to the MTL and PML layers. 2016-03-08 01:46:24 +02:00
Aurélien Bouteiller
892e1ed57e Fix a potential race condition in which a progress matching thread could match a request while we are cancelling it. 2016-03-01 16:43:45 -05:00
George Bosilca
dbe93b0b19 Use mca_bml_base_get_endpoint
Correctly use mca_bml_base_get_endpoint instead of accessing the
endpoint directly.
2016-02-25 11:00:30 -06:00
Sylvain Jeaugey
5f32f49eb8 pml/ob1: Fix segmentation fault on CUDA path.
Fix segfault due to mca_pml_ob1_cuda_need_buffers not handling the case of the
endpoint not being there. Calling mca_bml_get_endpoint() seems to fix the problem.

Fixes open-mpi/ompi#1402
2016-02-24 21:32:25 -08:00
Nathan Hjelm
230d04327e ompi: always enable MPI_THREAD_MULTIPLE support
This commit removes the --with-mpi-thread-multiple option and forces
MPI_THREAD_MULTIPLE support. This cleans up an abstration violation
in opal where OMPI_ENABLE_THREAD_MULTIPLE determines whether the
opal_using_threads is meaningful. To reduce the performance hit on
MPI_THREAD_SINGLE programs an OPAL_UNLIKELY is used for the
check on opal_using_threads in OPAL_THREAD_* macros.

This commit does not clean up the arguments to the various functions
that take whether muti-threading support is enabled. That should be
done at a later time.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-02-23 10:02:14 -07:00
George Bosilca
68c36ea9dc Fix two annoying warnings in our UCX support. 2016-02-14 00:02:16 -05:00
Gilles Gouaillardet
030a5f2054 sentinel: use type uintptr_t for sentinel
MSB is now automatically cleared when right shifting
Thanks George for pointing this
2016-02-10 11:28:56 +09:00
Joshua Ladd
18c5a21562 Fix typo in error handling flow. 2016-01-14 22:28:54 +02:00
Joshua Ladd
afa62d8ca1 Addressing reviewers' comments for https://github.com/open-mpi/ompi-release/pull/891 2016-01-14 19:22:27 +02:00
Tomislav Janjusic
3858bc8e62 Adding support for dynamic endpoint creation
Signed-off-by: Tomislav Janjusic <tomislavj@mngx-apl-01.mtl.labs.mlnx>
Signed-off-by: Tomislavj Janjusic <tomislavj@mellanox.com>
Signed-off-by: Joshua Ladd <jladd.mlnx@gmail.com>
2016-01-12 22:17:03 +02:00
Gilles Gouaillardet
ad9693c604 pml/yalla: add missing #include <alloca.h> 2015-12-24 14:33:58 +09:00
Gilles Gouaillardet
b38c17dbcb pml/cm: add missing #include <alloca.h>
Thanks Paul Hargrove for reporting this issue
2015-12-24 14:33:58 +09:00
Ralph Castain
ac6289dca6 Cleanup the warnings from the ompi layer when compiling optimized under Mac OSX
Cleanup per George's comments
2015-12-17 17:39:15 -08:00
igor.ivanov@itseez.com
041a6a9f53 ompi/pml: Fix warnings in yalla component 2015-12-16 16:22:30 +02:00
Alina Sklarevich
3ffd8dcd20 PML UCX: fix typo (following 7becc54d). 2015-12-10 13:51:10 +02:00
Nathan Hjelm
f68c315188 pml/ob1: add missing ompi_request_wait_completion for buffered sends
This commit adds a call to ompi_request_wait_completion for buffered
sends. Without this line it is possible to get into a state where the
data is never sent.

Fixes open-mpi/ompi#1185

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-12-07 22:28:07 -07:00
yosefe
3bb1270715 yalla: fix valgrind error due to uninitialized status field. 2015-11-19 10:59:31 +02:00
Yossi
b750b72a81 Merge pull request #1127 from yosefe/topic/pml-ucx-implement-cancel
pml_ucx: implement cancel, and add small optimizations.
2015-11-12 10:50:48 +02:00
yosefe
7becc54d67 pml_ucx: fix typo. 2015-11-12 09:57:41 +02:00
yosefe
d66b01d380 pml_ucx: implement cancel, and add small optimizations. 2015-11-10 17:40:06 +02:00
Gilles Gouaillardet
d6ff25b9a2 pml/monitoring: initialize common symbols 2015-11-10 13:58:54 +09:00
yosefe
45c3d04857 pml_ucx: fix request construct/destruct.
We should invoke OBJ_CONTRUCT/OBJ_DESTRUCT only on regular requests
(which are embedded inside UCX requests) and for the completed request.
Persistent requests are already constructed/destructed by the free list.
This fixes an assertion in ompi_request_destruct.
2015-11-04 11:03:46 +02:00
George Bosilca
5c60e76669 Fix Coverity CIDs 1338021, 1338020, 1338019, 1338018. 2015-11-02 17:38:51 -05:00
George Bosilca
b77c203068 Add more comments and restore the progress, flags, max tag, and max
context_id from the original PML.
2015-10-31 17:13:35 -04:00
George Bosilca
3efd494972 Make sure the monitoring infrastructure works well with the
new dynamic add_procs.
2015-10-31 17:13:35 -04:00
Guillaume Papauré
86714ad91e change pml_monitoring_messages_count and pml_monitoring_messages_size pvars to use the start/stop features 2015-10-31 17:13:35 -04:00
George Bosilca
a43c2ce529 Fully integrate the monitoring with the MPI_T PVAR.
Writing to the pml_monitoring_flush variable will set the filename of
the output file.
Stopping a session for the pml_monitoring_flush will force the
generation of the nobitoring output file (as long as the filename
is not NULL).
To reset the monitoring, une has to bind the pml_monitoring_flush to a
session.
2015-10-31 17:13:35 -04:00
George Bosilca
646a662721 Use the new group interface and add const to the PML send functions. 2015-10-31 17:13:35 -04:00
George Bosilca
5224a7ce4d Allow the pvar to be written by invoking the associated callback.
Use a PVAR to generate the monitoring dump of the information into a
file.

Use the PVAR to instruct the PML monitoring when to do the dump.
2015-10-31 17:13:35 -04:00
George Bosilca
df167f4177 Rewrite the close logic to be more clean and cleaner. 2015-10-31 17:13:35 -04:00
George Bosilca
c801ffde86 Use MPI_T variables to handle the flush in a more MPI-blessed way.
Code cleanup.

Update the monitoring test to use MPI_T variables.
2015-10-31 17:13:35 -04:00
George Bosilca
4f88c82500 Fix a convertion problem and add a comment about the lack of component
retain in the new component infrastructure.

Clean Makefile.am to fix "make distcheck".

Update the gitignore rules.
2015-10-31 17:13:35 -04:00
George Bosilca
80343a0d39 add ability to querry pml monitorinting results with MPI Tools interface
using performance variables "pml_monitoring_messages_count" and
"pml_monitoring_messages_size"

Per Brice suggestion make all data count and message length be
uint64_t.
2015-10-31 17:13:35 -04:00
George Bosilca
a47d69202f Add a monitoring PML. This PML track all data exchanges by the processes
counting or not the collective traffic as a separate entity. The need
for such a PML is simply because the PMPI interface doesn't allow us to
identify the collective generated traffic.
2015-10-31 17:13:35 -04:00
Rolf vandeVaart
f2ff6e03ab Make CUDA 4.1 a requirement for CUDA-aware support.
Remove all related preprocessor conditionals.
2015-10-29 11:24:02 -04:00
yosefe
ae738d0434 pml_ucx: add pmi fence in del_procs 2015-10-28 18:34:36 +02:00
yosefe
41b6230be3 pml_ucx: fix debug macros, and initialize mpi request properly. 2015-10-28 10:59:25 +02:00
Nathan Hjelm
08e267b811 add_procs: add threading protection for dynamic add_procs
This commit add protection to the group, ob1, and bml endpoint lookup
code. For ob1 and the bml a lock has been added. For performance
reasons the lock is only held if a bml or ob1 endpoint does not
exist. ompi_group_dense_lookup no uses opal_atomic_cmpset to ensure
the proc is only retained by the thread that actually updates the
group.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-21 16:13:41 -06:00
yosefe
cc76db8d39 ucx: reduce components priority to 5. 2015-10-21 17:38:25 +03:00
Mike Dubman
4ea13f10f6 Merge pull request #1008 from alex-mikheev/topic/ucx_support
UCX support for ompi and oshmem
2015-10-21 09:33:33 +03:00
yosefe
a313588337 ompi: Add UCX PML. 2015-10-20 19:46:06 +03:00
Nathan Hjelm
53f6b57c0a pml/cm: use the priority of the mtl component
This commit changes the priority of mtl components to be relative to
pml/ob1 and updates the mtl interface to expose this priority. cm now
sets its own priority based on the priority of the selected mtl
component.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-19 12:32:42 -06:00
Nathan Hjelm
bedd80214e pml/ob1: remove priority check
This commit removes code that checks the ob1 priority vs the previous
priority. The previous priority is meaningless here and may only cause
ob1 to disable itself when it shouldn't.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-19 12:32:41 -06:00
Nathan Hjelm
2fd176ac7f cm: fix selection priority
This patch removes a priority check that disables cm if the previous
pml had higher priority. The check was incorrect as coded and is
unnecessary as we finalize all but one pml anyway.

Fixes open-mpi/ompi#1035

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-19 12:32:26 -06:00
Nathan Hjelm
341b60dd57 Merge pull request #1029 from kawashima-fj/pr/ob1-fin-memory-leak
pml/ob1: Fix a memory leak regarding pending FIN control messages.
2015-10-15 07:55:52 -06:00
KAWASHIMA Takahiro
4e56505202 pml/ob1: Fix a memory leak regarding pending FIN control messages.
Once a FIN control message is appended to the pending list,
the ob1 PML attempts to send the FIN again in the                               `mca_pml_ob1_process_pending_packets` function.
But if the PML failed to sent the FIN again, the `mca_pml_ob1_send_fin`
function creates a new `mca_pml_ob1_pckt_pending_t` object and the
old object is not retured to the free list.
2015-10-15 11:15:03 +09:00
Jeff Squyres
889d80a659 mxm/yalla: disable MPI dynamic process functionality
Disable the MPI dynamic process functionality when these components
are selected to be used.
2015-10-14 13:42:56 -07:00
Nathan Hjelm
12bd300c40 Merge pull request #929 from hjelmn/add_procs
Update add_procs support
2015-09-28 17:29:13 -06:00
Nathan Hjelm
6611c000c9 Fix coverity warnings
Fix CID 1315271: Constant expression result

The intent of this conditional is to not produce a peruse event for
probe or mprobe requests. Coverity is correct that the expression is
always true. Changed the || to && to fix. Also moved the conditional
within an OMPI_WANT_PERUSE to ensure the conditional is not evaluated
if peruse is disabled.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-28 15:35:25 -06:00
George Bosilca
01d8e23ccc Fix the random errors related to the recursive sends and receives
identified by Fujitsu.
2015-09-26 00:44:51 +02:00
Nathan Hjelm
54a4061d88 Add support for detecting when dynamic add_procs is not possible
This commit adds support to the pml, mtl, and btl frameworks for
components to indicate at runtime that they do not support the new
dynamic add_procs behavior. At the high end the lack of dynamic
add_procs support is signalled by the pml using the new pml_flags
member to the pml module structure. If the
MCA_PML_BASE_FLAG_REQUIRE_WORLD flag is set MPI_Init will generate the
ompi_proc_t array passed to add_proc from ompi_proc_world () instead
of ompi_proc_get_allocated ().

Both cm and ob1 have been updated to detect if the underlying mtl and
btl components support dynamic add_procs.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-23 16:22:05 -06:00
Gilles Gouaillardet
a611274704 pml: fix commit open-mpi/ompi@6e6a3e965c
do not use the const modifier for allocator nor recv buffers
2015-09-18 09:54:18 +09:00
Nathan Hjelm
b4a0d40915 pml/ob1: Add support for dynamically calling add_procs
This commit contains the following changes:

 - pml/ob1: use the bml accessor function when requesting a bml
   endpoint. this will ensure that bml endpoints are only created when
   needed. for example, a bml endpoint is not requested and not
   allocated when receiving an eager message from a peer.

 - pml/ob1: change the pml_procs array in the ob1 communicator to a
   proc pointer array. at the cost of a single level of extra
   redirection this will allow us to allocate pml procs on demand.

 - pml/ob1: add an accessor function to access the pml proc structure
   for a given peer. this function will allocate the proc if it
   doesn't already exist.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-10 08:55:54 -06:00
Gilles Gouaillardet
6e6a3e965c pml: do not cast way the const modifier when this is not necessary
update the pml framework and mpi c bindings
2015-09-09 09:18:57 +09:00
Ralph Castain
cf6137b530 Integrate PMIx 1.0 with OMPI.
Bring Slurm PMI-1 component online
Bring the s2 component online

Little cleanup - let the various PMIx modules set the process name during init, and then just raise it up to the ORTE level. Required as the different PMI environments all pass the jobid in different ways.

Bring the OMPI pubsub/pmi component online

Get comm_spawn working again

Ensure we always provide a cpuset, even if it is NULL

pmix/cray: adjust cray pmix component for pmix

Make changes so cray pmix can work within the integrated
ompi/pmix framework.

Bring singletons back online. Implement the comm_spawn operation using pmix - not tested yet

Cleanup comm_spawn - procs now starting, error in connect_accept

Complete integration
2015-08-29 16:04:10 -07:00
yosefe
85580ad055 yalla: fix passing on-demand mapping config to mxm. 2015-08-18 15:00:59 +03:00
Gilles Gouaillardet
6b2fe9120e yalla: fix Makefile.am LDFLAGS 2015-08-13 17:33:52 +09:00