1
1
Граф коммитов

5913 Коммитов

Автор SHA1 Сообщение Дата
George Bosilca
646a662721 Use the new group interface and add const to the PML send functions. 2015-10-31 17:13:35 -04:00
George Bosilca
5224a7ce4d Allow the pvar to be written by invoking the associated callback.
Use a PVAR to generate the monitoring dump of the information into a
file.

Use the PVAR to instruct the PML monitoring when to do the dump.
2015-10-31 17:13:35 -04:00
George Bosilca
df167f4177 Rewrite the close logic to be more clean and cleaner. 2015-10-31 17:13:35 -04:00
George Bosilca
c801ffde86 Use MPI_T variables to handle the flush in a more MPI-blessed way.
Code cleanup.

Update the monitoring test to use MPI_T variables.
2015-10-31 17:13:35 -04:00
George Bosilca
4f88c82500 Fix a convertion problem and add a comment about the lack of component
retain in the new component infrastructure.

Clean Makefile.am to fix "make distcheck".

Update the gitignore rules.
2015-10-31 17:13:35 -04:00
George Bosilca
80343a0d39 add ability to querry pml monitorinting results with MPI Tools interface
using performance variables "pml_monitoring_messages_count" and
"pml_monitoring_messages_size"

Per Brice suggestion make all data count and message length be
uint64_t.
2015-10-31 17:13:35 -04:00
George Bosilca
a47d69202f Add a monitoring PML. This PML track all data exchanges by the processes
counting or not the collective traffic as a separate entity. The need
for such a PML is simply because the PMPI interface doesn't allow us to
identify the collective generated traffic.
2015-10-31 17:13:35 -04:00
Rolf vandeVaart
578385ca78 Merge pull request #1079 from rolfv/pr/cuda-require-41
Make CUDA 4.1 a requirement for CUDA-aware support
2015-10-29 12:56:22 -04:00
Nathan Hjelm
b1e3936261 Merge pull request #1078 from rolfv/pr/disable-osc-rdma-for-cuda
Disable the use of osc rdma when we detect a GPU buffer
2015-10-29 10:03:28 -06:00
Rolf vandeVaart
f2ff6e03ab Make CUDA 4.1 a requirement for CUDA-aware support.
Remove all related preprocessor conditionals.
2015-10-29 11:24:02 -04:00
Matias Cabral
8ebcac1b2c Merge pull request #1075 from matcabral/psm2_symbol_rename
Updated psm2 mtl with new externally exposed symbols of psm2.so 
Fixes open-mpi/ompi#1018
Fixes open-mpi/ompi#1021
2015-10-28 13:55:45 -07:00
Rolf vandeVaart
87a4cc6118 Disable the use of osc rdma when we detect a GPU buffer as it is not supported in that component.
This forces a failover to the osc pt2pt component. Fixes #1012
2015-10-28 14:47:45 -04:00
yosefe
ae738d0434 pml_ucx: add pmi fence in del_procs 2015-10-28 18:34:36 +02:00
Matias A Cabral
ed16d8e1cc Updated psm2 mtl with new externally exposed symbols of psm2.so
Fixes open-mpi/ompi#1018
Fixes open-mpi/ompi#1021
2015-10-28 09:12:33 -07:00
yosefe
41b6230be3 pml_ucx: fix debug macros, and initialize mpi request properly. 2015-10-28 10:59:25 +02:00
yohann
8bf1c95cdc mtl/ofi: Remove unused help messages. 2015-10-27 09:38:04 -07:00
Nathan Hjelm
69d403d42b Merge pull request #1054 from hjelmn/add_procs_threading
add_procs: add threading protection for dynamic add_procs
2015-10-27 09:27:13 -06:00
yohann
a111d66f0f mtl/ofi: Change hints to FI_PROGRESS_MANUAL. 2015-10-26 15:32:30 -07:00
yohann
fde8b89ceb mtl/ofi: Use OFI's representation of ANY_SRC instead of NULL. 2015-10-26 14:38:41 -07:00
yohann
4246de4508 mtl/ofi: Treat error correctly. 2015-10-26 14:38:33 -07:00
George Bosilca
2622b9d3a1 Fix minor issues in the treematch topo
based on a patch provided by Guillaume.
2015-10-25 21:38:59 -04:00
Jeff Squyres
140cf90e3e osc_rdma: minor compiler warning stomp 2015-10-23 06:21:56 -07:00
Nathan Hjelm
63e744ffc6 osc/rdma: use only a single btl registration for local state
This commit fixes a bug that can occur on Cray Gemini networks. If
multiple registrations are used for the local state then we looks the
atomicity guarantees. To avoid issues like this use only a single
registration handle for all local state on a node.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-22 15:51:19 -06:00
Nathan Hjelm
f690fc8fd5 osc/pt2pt: fix warnings
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-22 15:50:40 -06:00
Nathan Hjelm
e4219aa692 Merge pull request #1059 from hjelmn/osc_fixes
osc/rdma: bug fixes
2015-10-22 11:25:51 -06:00
Nathan Hjelm
97c9732bad osc/rdma: bug fixes
This commit fixes the following:

 - CIDs 1328491, 1328492: Dead code caused by typos in a prior
   commit.

 - Fix the calculation of dynamic memory regions. This was causes
   incorrect RMA range errors when accessing the last partial page of
   an attachment.

 - Fix a SEGV when using dynamic memory windows with local state (all
   processes on the same node).

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-22 09:49:38 -06:00
yohann
889c76634e mtl/ofi: Increase priority. 2015-10-22 08:39:36 -07:00
Nathan Hjelm
b2fa2a9bef Merge pull request #1056 from hjelmn/osc_fixes
osc/pt2pt: reset all_sync sync object before sending complete messages
2015-10-21 19:40:28 -06:00
Nathan Hjelm
864f88a2a3 osc/pt2pt: reset all_sync sync object before sending complete messages
This commit fixes a bug that occurs when a post message comes in when
sending complete messages or while waiting for all outgoing messages
to flush. In that case the post message might get incorrecly
associated with the ending sync object.

References open-mpi/ompi#1012

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-21 18:30:08 -06:00
Nathan Hjelm
08e267b811 add_procs: add threading protection for dynamic add_procs
This commit add protection to the group, ob1, and bml endpoint lookup
code. For ob1 and the bml a lock has been added. For performance
reasons the lock is only held if a bml or ob1 endpoint does not
exist. ompi_group_dense_lookup no uses opal_atomic_cmpset to ensure
the proc is only retained by the thread that actually updates the
group.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-21 16:13:41 -06:00
Nathan Hjelm
386991d590 Merge pull request #1052 from hjelmn/osc_rdma_fixes
osc/rdma: use standard verbosity levels
2015-10-21 13:21:50 -06:00
Nathan Hjelm
9476c7bbca osc/rdma: use standard verbosity levels
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-21 12:31:41 -06:00
yohann
abe5002ee9 mtl/ofi: remove threading and progress hints. 2015-10-21 10:25:08 -07:00
yosefe
cc76db8d39 ucx: reduce components priority to 5. 2015-10-21 17:38:25 +03:00
Mike Dubman
4ea13f10f6 Merge pull request #1008 from alex-mikheev/topic/ucx_support
UCX support for ompi and oshmem
2015-10-21 09:33:33 +03:00
Nathan Hjelm
763744a32c Merge pull request #1046 from hjelmn/osc_rdma_fixes
osc/rdma: bug fixes
2015-10-20 16:44:38 -06:00
Nathan Hjelm
b8ee05d352 osc/rdma: bug fixes
This commit fixes several bugs in the osc/rdma component:

 - Complete aggregated requests immediately. Completion of RMA
   requests indicates local completion anyway. This fixes a hang in
   the c_reqops test.

 - Correctly mark Rget_accumulate requests.

 - Set the local base flag correctly on the local peer.

 - Clear or set the no locks flag on the window if the value is
   changed by MPI_Win_set_info.

 - Actually update the target when using MPI_OP_REPLACE.

Fixes open-mpi/ompi#1010

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-20 15:27:15 -06:00
Ryan Grant
f60c506c68 Merge pull request #999 from tkordenbrock/topic/add.triggered.gather
coll-portals4: add gather and igather implementations that use Portals4 triggered operations
2015-10-20 14:59:09 -06:00
yosefe
a313588337 ompi: Add UCX PML. 2015-10-20 19:46:06 +03:00
Nathan Hjelm
9602484568 Merge pull request #1040 from hjelmn/mtl_priority
Change how cm's priority is calculated
2015-10-19 14:18:36 -06:00
Nathan Hjelm
53f6b57c0a pml/cm: use the priority of the mtl component
This commit changes the priority of mtl components to be relative to
pml/ob1 and updates the mtl interface to expose this priority. cm now
sets its own priority based on the priority of the selected mtl
component.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-19 12:32:42 -06:00
Nathan Hjelm
8b5810f7f7 mca/base: add priority output to mca_base_select
The mca_base_select function uses returned priorities to select the
best component/module. This priority may be of use to the caller so
pass that information back in an optional argument. If the priority is
not needed pass NULL.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-19 12:32:41 -06:00
Nathan Hjelm
bedd80214e pml/ob1: remove priority check
This commit removes code that checks the ob1 priority vs the previous
priority. The previous priority is meaningless here and may only cause
ob1 to disable itself when it shouldn't.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-19 12:32:41 -06:00
Nathan Hjelm
2fd176ac7f cm: fix selection priority
This patch removes a priority check that disables cm if the previous
pml had higher priority. The check was incorrect as coded and is
unnecessary as we finalize all but one pml anyway.

Fixes open-mpi/ompi#1035

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-19 12:32:26 -06:00
Gilles Gouaillardet
0f23037775 coll/base: fix memory allocation in mca_coll_base_alltoall_intra_basic_inplace 2015-10-19 16:47:59 +09:00
Edgar Gabriel
f97655f28e make sure the iov buffer is initialized to zero, otherwise bad things can happen for 0-byte contributions on a process. 2015-10-15 12:46:01 -05:00
Edgar Gabriel
0177918a08 limit the number of bytes used for the semaphore name depending on platform (31 bytes for MacOS, 252 for Linux) 2015-10-15 11:13:45 -05:00
Nathan Hjelm
341b60dd57 Merge pull request #1029 from kawashima-fj/pr/ob1-fin-memory-leak
pml/ob1: Fix a memory leak regarding pending FIN control messages.
2015-10-15 07:55:52 -06:00
KAWASHIMA Takahiro
4e56505202 pml/ob1: Fix a memory leak regarding pending FIN control messages.
Once a FIN control message is appended to the pending list,
the ob1 PML attempts to send the FIN again in the                               `mca_pml_ob1_process_pending_packets` function.
But if the PML failed to sent the FIN again, the `mca_pml_ob1_send_fin`
function creates a new `mca_pml_ob1_pckt_pending_t` object and the
old object is not retured to the free list.
2015-10-15 11:15:03 +09:00
Jeff Squyres
8307330e8a Merge pull request #989 from jsquyres/pr/friendly-message-when-dynamics-disabled
Print friendly message when dynamics disabled
2015-10-14 19:52:52 -04:00
Jeff Squyres
889d80a659 mxm/yalla: disable MPI dynamic process functionality
Disable the MPI dynamic process functionality when these components
are selected to be used.
2015-10-14 13:42:56 -07:00
Nathan Hjelm
e11f014c6e osc/rdma: fix segmentation fault when running 1 ppn
This commit fixes an issue identified by @rolfv. The local peer was
not being correctly initialized when running with a single process on
a node.

This fixes open-mpi/ompi#1010

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-14 12:40:52 -06:00
Jeff Squyres
62351f442a help: remove stale help messages and files
Found by contrib/check-help-strings.pl.
2015-10-13 16:50:20 -04:00
Todd Kordenbrock
7c738fb657 coll-portals4: add gather and igather implementations that use Portals4 triggered operations
This commit adds implementations of gather and igather using
Portals4 triggered operations.  The default algorithm is linear,
but binomial can be selected using an MCA parameter -
coll_portals4_use_binomial_gather_algorithm.
2015-10-13 11:26:35 -05:00
Nathan Hjelm
d8dc5292ed Merge pull request #1002 from hjelmn/ompi_coverity
ompi: fix coverity issues
2015-10-09 12:27:41 -06:00
bosilca
1310acc83f Merge pull request #912 from bosilca/topic/coll_requests
This patch fixes the issues identified by @ggouaillardet in the IBM tests (collectives and topologies). It also improves the memory usage of OMPI, as a communicator without collective communications will never allocate the array of requests needed to coordinate the basic collective algorithms. This ticket replaced #790.
2015-10-09 11:27:07 -04:00
Nathan Hjelm
4cb42f8264 ompi: fix coverity issues
Fixes CID 715741: Logically dead code

Verified. Removed dead code.

Fixes CID 1320878: Resource leak

Free proc_list before returning.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-10-09 08:41:27 -06:00
Todd Kordenbrock
141b20d991 osc-portals4: Initialize datatype in MPI_Get_accumulate and MPI_Rget_accumulate
Fix code paths that didn't convert the MPI datatype to the
corresponding Portals4 datatype.

Thanks to Nicolas Chevalier (@shawone) for finding this bug and
submitting a patch.
2015-10-08 12:17:19 -05:00
Gilles Gouaillardet
e946c82847 Revert "coll/basic: fix segmentation fault in neighborhood collectives if the degree"
This partially reverts commit open-mpi/ompi@76204dfafe.
2015-10-08 12:00:41 -04:00
Gilles Gouaillardet
99cca2cfd3 Revert "* comment on communicator creation in mca_topo_base_dist_graph_create(...)"
This partially reverts commit open-mpi/ompi@27e4389259.
2015-10-08 12:00:41 -04:00
George Bosilca
a8bdd8f668 Don't lose the pointer to the request array. Patch provided by
@ggouaillardet.
2015-10-08 12:00:41 -04:00
George Bosilca
88492a1e12 Consistently use the request array for all modules (single array stored
in the base).
Correctly deal with persistent requests (they must be always freed when
they are stored in the request array associated with the communicator).
Always use MPI_STATUS_IGNORE for single request waiting functions.
2015-10-08 12:00:41 -04:00
George Bosilca
01b32caf98 Update the basic module to dynamically allocate the right
number of requests.

Remove unnecessary fields.We don't need these fields.
2015-10-08 12:00:41 -04:00
George Bosilca
a324602174 Never allocate a temporary array for the requests. Instead rely on the
module_data to hold one with the largest necessary size. This array is
only allocated when needed, and it is released upon communicator
destruction.
2015-10-08 12:00:41 -04:00
Ryan Grant
8134ba76f1 Merge pull request #998 from tkordenbrock/topic/fix.incorrect.ompi_proc.cast
Looks good to me.

mtl-portals4: fix bug in the Portals4 get_peer family
2015-10-08 08:38:16 -06:00
Todd Kordenbrock
88d79efd9f mtl-portals4: fix bug in the Portals4 get_peer family
The Portals4 get_peer family incorrectly cast the ompi_proc_t to
ptl_process_t and returned that as the peer.  The ptl_process_t is
actually found in the endpoint array.  This commit fixes the
Portals4 get_peer family to return the dereferenced endpoint
pointer.
2015-10-08 07:57:48 -05:00
Todd Kordenbrock
f33b0c1cdf coll-portals4: allreduce: remove extra %d from error message. 2015-10-08 07:57:33 -05:00
Devendar Bureddy
72f98ccf6c HCOLL: Enable alltoall interface 2015-10-07 08:00:04 +03:00
Edagr Gabriel
8af80cd02c update the interfaces of the sharedfp addproc component to match the changes made in the const commit. 2015-10-06 07:54:38 -05:00
Gilles Gouaillardet
de8de65b07 coll/tuned: remove unused prototypes from coll_tuned.h 2015-10-06 09:07:48 +09:00
Mike Dubman
e8d7373b14 COLL/FCA: revert to prev barrier if called from finalize
FCA barrier may not complete if FCA progress is not called periodically.
PMI/PMI2 API that can be used in rte barrier has no provision for calling
external progress function.

So it is possible that during finalize some ranks will be stuck
in fca barrier while others are in PMI barrier.
2015-10-04 09:40:19 +03:00
Nathan Hjelm
5122327727 fcoll/two_phase: fix new coverity errors
Fix CID 1325467: use after free

Remove extra free of aggregator_list.

Fix CID 1325466: resource leak

Fix typo in prior coverity fix.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-10-02 21:38:31 -06:00
Nathan Hjelm
eb79edff33 Merge pull request #963 from hjelmn/ompi_coverity
fcoll/two_phase: fix coverity errors
2015-10-02 10:22:24 -06:00
Devendar Bureddy
243b75aa80 HCOLL: Add alltoallv interface 2015-10-02 01:51:33 +03:00
Nathan Hjelm
95b95e19af fcoll/dynamic: fix coverity errors
Fixes CID 72320: Explicit NULL dereferenced

On error it is possible that the blocklen_per_process array is
NULL. Change the NULL check before the free to check for non-NULL on
the array not the array element. Also clean up allocation of this
array to use calloc instead of malloc + setting each element to NULL.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-01 14:38:09 -06:00
Nathan Hjelm
09df7aa205 fcoll/two_phase: fix coverity errors
Fixes CIDs 72300, 72344, 1196764-1196768, 72300: Resource leaks

Mulitple allocated arrays are going out of scope at the end of
mca_fcoll_two_phase_file_write_all. Free these arrays. Also removed
the extraneous NULL checks since free (NULL) is safe in C.

Change returns to goto exit where the allocated resources are freed.

Fixes CIDs 72285-72292, 72297, 72298: Resource leaks

Change all appropriate return statements to goto exit to ensure that
all resources are freed. Also removed the NULL checks since free
(NULL) is safe in C.

Fixes CIDs 72295, 72296: Resource leaks

Moved free of requests and recv_types to after exit label. This will
ensure these are freed on error.

Also added a loop and statement to free send_buf which is going out of
scope at the end of the function.

Fixes CIDs 72336-72240, 735197, 735198: Resource leaks

Moved the exit label before to before the resources are released and
changed all appropriate return statements to goto exit. Also removed
extraneous NULL checks because free (NULL) is safe in C.

Fixes CIDs 72341, 72343, 1196805-1196809: Resource leaks

Free all resources after exit label and change return statements to
goto exit to ensure all resources are freed on error.

Fixes CID 1269973: Unused value

Check return code of ompi_request_wait_all. If it fails jump to the
exit.

Fixes CID 714119: Dereference before NULL check

Wrong value checked in conditional.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-01 14:38:09 -06:00
Nathan Hjelm
5fd9c35957 osc/rdma: fix incorrect assert
This commit fixes MTT failures in debug builds.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-29 15:37:40 -06:00
Nathan Hjelm
7b8ec48c68 osc/rdma: fix typos inarguments to btl_atomic_[f]op
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-09-29 08:09:00 -06:00
Nathan Hjelm
12bd300c40 Merge pull request #929 from hjelmn/add_procs
Update add_procs support
2015-09-28 17:29:13 -06:00
Nathan Hjelm
6611c000c9 Fix coverity warnings
Fix CID 1315271: Constant expression result

The intent of this conditional is to not produce a peruse event for
probe or mprobe requests. Coverity is correct that the expression is
always true. Changed the || to && to fix. Also moved the conditional
within an OMPI_WANT_PERUSE to ensure the conditional is not evaluated
if peruse is disabled.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-28 15:35:25 -06:00
bosilca
0a3c54ed61 Merge pull request #942 from bosilca/topic/global_request
Fix for "Random errors on MPI_COMPARE_AND_SWAP with pt2pt OSC of Open MPI master" (#933)
2015-09-27 16:56:29 +02:00
Nathan Hjelm
552e1b59a5 osc/rdma: fix coverity issues
Fixes CID 1324730, 1327429, 1324728, 1196633, 1324731, 1324727, and
1196632: Logically dead code

OMPI_OSC_RDMA_REQUEST_ALLOC can never return a NULL request. Removed
unnecessary NULL checks.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-09-26 12:45:14 -06:00
Nathan Hjelm
ebf19ac5eb osc/pt2pt: fix coveity issues
Fixed CID 1269712, 1269709, 1269706, 1269703, 1269694: Logically dead code

Remove extra NULL check as OMPI_OSC_PT2PT_REQUEST_ALLOC can never set the
request to NULL.

Fixes CID 1269668: Unchecked return value

False positive. Add (void) to indicate we do not care about the return code
from opal_hash_table_get_uint32.

Fixes CID 1324726: Free of address-of expression

Do not free lock if it was not allocated.

Fixes CID 1269658: Free of address-of expression

Never will happen but because op is always a built-in op there is no
reason to retain/release it anyway.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-09-26 11:18:22 -06:00
George Bosilca
01d8e23ccc Fix the random errors related to the recursive sends and receives
identified by Fujitsu.
2015-09-26 00:44:51 +02:00
Nathan Hjelm
f84716fcd0 Merge pull request #941 from hjelmn/osc_pt2pt_fix
osc/pt2pt: fix heterogenous build
2015-09-25 08:07:09 -06:00
Nathan Hjelm
ae7f47e04d osc/pt2pt: fix heterogenous build
Fixes #940

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-09-25 00:15:02 -06:00
Todd Kordenbrock
3e63a3458c portals4: add support for dynamic add_procs() to all Portals4 components
In the default mode of operation, the Portals4 components support
dynamic add_procs().

The Portals4 components have two alternate modes (flow control and
logical-to-physical) that require knowledge of all procs at startup.
In these modes, mtl-portals4 sets the MCA_MTL_BASE_FLAG_REQUIRE_WORLD
flag and btl-portals4 sets the MCA_BTL_FLAGS_SINGLE_ADD_PROCS flag
to tell the PML that we need all the procs in one add_procs() call.
2015-09-24 22:12:57 -05:00
Nathan Hjelm
248212276d osc/sm: fix remaining coverity issues
Fixes CID 1324870: Memory - illegal accesses (USE_AFTER_FREE)

Free osc module after calling destruct on the lock.

Fixes CID 1324868: Integer handling issues (OVERFLOW_BEFORE_WIDEN)
Fixes CID 1324867: Integer handling issues (OVERFLOW_BEFORE_WIDEN)

Explicitly cast to uint64_t to ensure the widen happens before an overflow
can occur.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-24 15:55:01 -06:00
Nathan Hjelm
4ab4c4d7e9 Merge pull request #930 from hjelmn/ompi_coverity
coll/libnbc: fix coverity errors
2015-09-23 17:07:34 -06:00
Nathan Hjelm
54a4061d88 Add support for detecting when dynamic add_procs is not possible
This commit adds support to the pml, mtl, and btl frameworks for
components to indicate at runtime that they do not support the new
dynamic add_procs behavior. At the high end the lack of dynamic
add_procs support is signalled by the pml using the new pml_flags
member to the pml module structure. If the
MCA_PML_BASE_FLAG_REQUIRE_WORLD flag is set MPI_Init will generate the
ompi_proc_t array passed to add_proc from ompi_proc_world () instead
of ompi_proc_get_allocated ().

Both cm and ob1 have been updated to detect if the underlying mtl and
btl components support dynamic add_procs.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-23 16:22:05 -06:00
Nathan Hjelm
2c89c7f47d ompi/proc: add function to get all allocated procs
This commit adds two new functions:

 - ompi_proc_get_allocated - Returns all procs in the current job that
   have already been allocated. This is used in init/finalize to
   determine which procs to pass to add_procs/del_procs.

 - ompi_proc_world_size - returns the number of processes in
   MPI_COMM_WORLD. This may be removed in favor of callers just
   looking at ompi_process_info.

The behavior of ompi_proc_world has been restored to return
ompi_proc_t's for all processes in the current job. The use of this
function is discouraged.

Code that was using ompi_proc_world() has been updated to make use of
the new functions to avoid the memory overhead of ompi_comm_world ().

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-23 16:22:05 -06:00
Nathan Hjelm
30f8d0b038 coll/libnbc: fix coverity errors
Fix CID 1196812: Resource Leak

dsts array was leaked on error.

Fix CID 710565: Copy-paste error

The line in question (nbc:513) is indeed a copy-paste error.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-23 16:14:49 -06:00
William Throwe
80bb41a079 ROMIO configure looks for lstat in wrong header
ROMIO configure looks for lstat in wrong header

The ROMIO configure script checks for a declaration of lstat in
unistd.h, but, at least on the Linux machines I checked, lstat is in
sys/stat.h.  (The detection failure led to a linker error when building
ROMIO as part of OpenMPI on one of my admittedly strangely configured
machines, somehow.)  It appears from the man page that either location
is possible, so check both.

(cherry picked from mpich/mpich@7b8bd055df)

Signed-off-by: Rob Latham <robl@mcs.anl.gov>
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-23 11:56:53 -06:00
Nathan Hjelm
db74fa9d0f bml/r2: fix memory leak
The add_procs change made some assumptions in the bml/r2 add_procs
wrong. This lead to del_procs never being called. I removed the logic
that checks the ompi_proc_t reference count and removed an unnecessary
allocation. The allocation only makes sense if we pass more than a
single proc at a time to the btl del_procs.

This commit also ensures that the btl del_procs is called if the
endpoint is in the btl_rdma array but not the btl_send array.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-23 10:45:13 -06:00
Nathan Hjelm
ee5810813b osc/pt2pt: fix regression in pscw sync on 0 size groups
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-22 17:09:00 -06:00
Nathan Hjelm
f6920aa916 osc/rdma: check for usable btls during query
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-22 17:08:28 -06:00
Nathan Hjelm
903762e194 osc/sm: fix pscw synchronization
The osc/sm component was using a simple counter to determine if all
expected posts had arrived to start a PSCW access epoch. This is
incorrect as a post may arrive from a peer that isn't part of the
current start group. There are many ways this could have been fixed.
This commit adds an n^2 bitmap. When a process posts it sets a bit in
the bitmap associated with the access rank to indicate the post is
complete. The access rank checks for and clears the bits associated
with all the processes in the start group.

The bitmap requires comm_size ^ 2 bits of space. This should be
managable as most nodes have relatively small numbers of processes. If
this changes another algorigthm can be implemented.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-22 16:00:27 -06:00
Nathan Hjelm
036395dc0f osc/pt2pt: fix typos
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-22 10:30:01 -06:00
Nathan Hjelm
974061c38f osc: fixed issues identified by coverity
Fix CID 1324733: Null pointer dereferences  (FORWARD_NULL)
Fix CID 1324734: Null pointer dereferences  (FORWARD_NULL)
Fix CID 1324735: Null pointer dereferences  (FORWARD_NULL)
Fix CID 1324736: Null pointer dereferences  (FORWARD_NULL)
Fix CID 1324737: Null pointer dereferences  (FORWARD_NULL)
Fix CID 1324751: Memory - illegal accesses  (USE_AFTER_FREE)
Fix CID 1324750: (USE_AFTER_FREE)
Fix CID 1324749: Memory - corruptions  (USE_AFTER_FREE)
Fix CID 1324748: Memory - illegal accesses  (USE_AFTER_FREE)
Fix CID 1324747: (USE_AFTER_FREE)
Fix CID 1324746: Memory - corruptions  (USE_AFTER_FREE)

Add missing return on an error path.

Fix CID 1324745: Code maintainability issues  (UNUSED_VALUE)

Ignore return code from barrier. It was not being used anyway.

Fix CID 1324738: Null pointer dereferences  (FORWARD_NULL)
Fix CID 1324741: Null pointer dereferences  (REVERSE_INULL)

module->selected_btl can not be NULL in osc/rdma during normal
operation. Removed the unnecessary NULL check.

Fix CID 1324752: Memory - illegal accesses  (USE_AFTER_FREE)

Move ompi_osc_pt2pt_module_lock_remove to before the lock is freed.

Fix CID 1324744: Uninitialized variables  (UNINIT)
Fix CID 1324743: Uninitialized variables  (UNINIT)

This array is not used unitialized but there is no reason not to use
calloc here to silence the warning.

The following CID is a false positive: 1324742. I will mark it such in
coverity.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-22 09:23:39 -06:00
Rolf vandeVaart
2c51faa58d Fix warnings due to missing const 2015-09-21 14:18:44 -04:00
Nathan Hjelm
60c2b0df48 Merge pull request #903 from hjelmn/new_osc_rdma
osc/rdma: add true RDMA one-sided component
2015-09-21 10:29:11 -06:00
Nathan Hjelm
88100ad670 Merge pull request #902 from hjelmn/new_osc
osc/pt2pt: reduce memory footprint of windows
2015-09-21 10:28:41 -06:00
Edgar Gabriel
01fcfb08fe do not set the contigous flag in two_phase_file_read_all. This optimization
needs some more debugging for the two_phase component, and is disabled
for two_phase_file_write_all as well.
2015-09-18 09:30:50 -05:00
Edgar Gabriel
3734a38370 this file should have been part of the previous commit. for removeing io_ompio_nbc.[ch] 2015-09-18 09:28:25 -05:00
Edgar Gabriel
cf46a6bd4d remove the io_ompio_nbc.[ch] files, they are not used anymore at this point in time. 2015-09-18 09:26:25 -05:00
Gilles Gouaillardet
a611274704 pml: fix commit open-mpi/ompi@6e6a3e965c
do not use the const modifier for allocator nor recv buffers
2015-09-18 09:54:18 +09:00
Jeff Squyres
567c9e3a5b mtl_ofi_component.c: add missing argv.h header 2015-09-17 10:05:05 -07:00
Nathan Hjelm
d8df9d414d osc/rdma: add true RDMA one-sided component
This commit adds support for performing one-sided operations over
supported hardware (currently Infiniband and Cray Gemini/Aries). This
component is still undergoing active development.

Current features:

 - Use network atomic operations (fadd, cswap) for implementing
   locking and PSCW synchronization.

 - Aggregate small contiguous puts.

 - Reduced memory footprint by storing window data (pointer, keys,
   etc) at the lowest rank on each node. The data is fetched as each
   process needs to communicate with a new peer. This is a trade-off
   between the performance of the first operation on a peer and the
   memory utilization of a window.

TODO:

 - Add support for the accumulate_ops info key. If it is known that
   the same op or same op/no op is used it may be possible to use
   hardware atomics for fetch-and-op and compare-and-swap.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-16 15:01:33 -06:00
Nathan Hjelm
fd42343ff0 osc/pt2pt: reduce memory footprint of window
This commit updates osc/pt2pt to allocate peer object as they are
needed rather than all at once. Additionally, to help improve the
memory footprint a new synchronization structure has been added.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-16 13:01:56 -06:00
George Bosilca
02624bd0b6 Fix all treematch issues idenfied by Coverity. 2015-09-15 23:49:11 -04:00
George Bosilca
6ab5f68fc3 indentation. 2015-09-15 22:46:13 -04:00
Ralph Castain
c1bbbb5e2f Remove the last involvement of the OOB system from the MPI layer, remove the no-longer-needed usock/oob component, and have procs no longer open the RML, OOB, ROUTED, and GRPCOMM frameworks as PMIx now provides all required app-mpirun cmds 2015-09-15 13:08:35 -07:00
Nathan Hjelm
898a0a038c bml/r2: fix coverity CID 1323765
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-09-11 09:39:10 -06:00
Gilles Gouaillardet
a1627feaf7 coll/ml, bcol: fix prototypes (e.g. use the const modifier) 2015-09-11 13:20:44 +09:00
Nathan Hjelm
ad3a2ef6cc silence warnings introduced by add_procs merge
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-10 16:33:52 -06:00
Nathan Hjelm
987e865c99 mtl/psm2: add support for dynamic add_procs
Add an accessor for the proc_endpoints[OMPI_PROC_ENDPOINT_TAG_MTL]
member of the ompi_proc_t structure. This accessort calls add_procs
with the ompi_proc_t if the member is NULL.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-09-10 08:55:55 -06:00
Nathan Hjelm
8df9b1d40d mtl/psm: add support for dynamic add_procs
Add an accessor for the proc_endpoints[OMPI_PROC_ENDPOINT_TAG_MTL]
member of the ompi_proc_t structure. This accessort calls add_procs
with the ompi_proc_t if the member is NULL. Tested on an infinipath
system with InfiniPath_QLE7340 HCAs.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-09-10 08:55:55 -06:00
Nathan Hjelm
5b7943db78 ompi/group: do not allocate ompi_proc_t's on group union/difference
This commit modifies the ompi_group_t union/difference code to compare/copy the
raw group values. This will either be a ompi_proc_t or a sentinel value. This
commit also adds helper functions to convert between opal process names and
sentinel values.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-09-10 08:55:55 -06:00
Nathan Hjelm
d8b0a6efda Remove use of ompi_comm_peer_lookup in osc/sm
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-10 08:55:54 -06:00
Nathan Hjelm
a41889112c Remove calls to ompi_group_peer_lookup in coll/sm and coll/fca
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-10 08:55:54 -06:00
Nathan Hjelm
b4a0d40915 pml/ob1: Add support for dynamically calling add_procs
This commit contains the following changes:

 - pml/ob1: use the bml accessor function when requesting a bml
   endpoint. this will ensure that bml endpoints are only created when
   needed. for example, a bml endpoint is not requested and not
   allocated when receiving an eager message from a peer.

 - pml/ob1: change the pml_procs array in the ob1 communicator to a
   proc pointer array. at the cost of a single level of extra
   redirection this will allow us to allocate pml procs on demand.

 - pml/ob1: add an accessor function to access the pml proc structure
   for a given peer. this function will allocate the proc if it
   doesn't already exist.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-10 08:55:54 -06:00
Nathan Hjelm
6fa6513003 bml: Add support for dynamically calling add_procs
This commit contains the following changes:

 - bml: add a function to add a single process. this function is
   intended to remove the need to maintain a opal_bitmap_t as it is
   irrelevant for a single proc. BTLs will need to be updated to
   either 1) ignore the return code from opal_bitmap_set_bit or not
   call the function if the reachability bitmap is NULL.

 - bml: add an inline accessor function for getting the bml endpoint
   for a peer proc. this function will either 1) return the cached bml
   endpoint, or 2) create the endpoint and call add_proc will all
   available BTL modules.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-10 08:55:54 -06:00
Gilles Gouaillardet
fe351f6801 io: do not cast way the const modifier when this is not necessary
update the io framework and mpi c bindings
2015-09-09 09:18:58 +09:00
Gilles Gouaillardet
e01bac962f coll: do not cast way the const modifier when this is not necessary
update the coll framework and mpi c bindings
2015-09-09 09:18:57 +09:00
Gilles Gouaillardet
6e6a3e965c pml: do not cast way the const modifier when this is not necessary
update the pml framework and mpi c bindings
2015-09-09 09:18:57 +09:00
Gilles Gouaillardet
43ef261d46 topo: do not cast way the const modifier when this is not necessary
update the topo framework and mpi c bindings
2015-09-09 09:18:57 +09:00
Jeff Squyres
bc9e5652ff whitespace: purge whitespace at end of lines
Generated by running "./contrib/whitespace-purge.sh".
2015-09-08 09:47:17 -07:00
Edgar Gabriel
c83e6ad0c8 fix coverty warnings 1322865 and 72136 2015-09-08 09:15:57 -05:00
Gilles Gouaillardet
c404e98dce coll/ml: silence warnings (incorrect callback prototype) 2015-09-07 14:56:49 +09:00
Gilles Gouaillardet
56f8a7b840 coll/ml: declare a global variable as static to avoid an uninitialized common symbol. 2015-09-07 14:56:03 +09:00
Jeff Squyres
794ee4a604 treematch: remove stale test
This test was accidentally left over from
open-mpi/ompi@d97bc29102 that prevented
the treematch component from building.
2015-09-05 05:02:30 -07:00
rhc54
665b30376a Merge pull request #868 from rhc54/topic/hwloc
Remove OPAL_HAVE_HWLOC qualifier and error out if --without-hwloc is given
2015-09-04 17:58:07 -07:00
Ralph Castain
d97bc29102 Remove OPAL_HAVE_HWLOC qualifier and error out if --without-hwloc is given 2015-09-04 16:54:40 -07:00
rhc54
d45ccda813 Merge pull request #866 from rhc54/topic/updatepmix
Update PMIx support
2015-09-04 11:09:36 -07:00
Ralph Castain
f6948c2bb4 Sync with PMIx master 43e45c3. Get multi-node publish/lookup/unpublish working 2015-09-04 10:07:17 -07:00
Pavel Shamis / Pasha
c3446f363b Merge pull request #859 from shamisp/topic/ml_soft_disable
ML: Replace opal ignore with a zero priority
2015-09-04 12:37:37 -04:00
Pavel Shamis (Pasha)
32c69630ad ML: Replace opal ignore with a zero priority
The priority set by default to 0. As a result component open reports
an error and the component is not loaded (no resources allocated).
2015-09-04 11:28:47 -04:00
yohann
404393b9d7 mtl/ofi: Minor code cleanup. 2015-09-03 15:04:55 -07:00
yohann
a8cac09769 mtl/ofi: Renamed macro to prevent clash with FI_ namespace. 2015-09-03 14:42:45 -07:00
yohann
7adb9b7ab4 mtl/ofi: Handle -FI_EAGAIN on send and recv operations. 2015-09-03 10:47:00 -07:00
Edgar Gabriel
c9710660af Merge pull request #863 from edgargabriel/topic/fcoll-static-cleanup
Topic/fcoll static cleanup
2015-09-03 11:21:02 -05:00
Edgar Gabriel
a96a15a83c re-enable the contiguous buffer optimization similarly to the dynamic component. Passes all hdf5testsi and our own test suite.
Please enter the commit message for your changes. Lines starting
2015-09-03 10:13:03 -05:00
Edgar Gabriel
8007effc93 code cleanup for static component, similarly to the dynamic one 2015-09-03 10:12:45 -05:00
Edgar Gabriel
ac3a01c39c Silence coverty warnings 1321702, 1321701, 1321700, 72331, 72330, 72327, 72326, 72325, 2015-09-03 09:10:25 -05:00
Ralph Castain
a772b46c15 Bring the MPI_Publish and friends online 2015-09-02 12:04:07 -07:00
Edgar Gabriel
e95d01be97 Merge pull request #847 from edgargabriel/topic/fcoll-dynamic-cleanup
Topic/fcoll dynamic cleanup
2015-09-01 16:10:55 -05:00
Nathan Hjelm
2a8cc5e637 osc/pt2pt: remove outstanding lock only after lock/flush ack received
fixes #840

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-01 10:54:47 -06:00
Edgar Gabriel
82efc23e8d iclean up indenting and tabs/space of fcoll_static_file_read/write_all 2015-09-01 09:39:33 -05:00
Edgar Gabriel
a1778406d6 Re-enable the contiguous buffer optimization to the read_all and the write_all routines.
After long debugging, I found last week the reason this optimization originally broke
some hdf5 tests. We now pass the hdf5 test suite with the optimization being actively used.
2015-09-01 09:29:07 -05:00
Edgar Gabriel
c2c44b11dc Code cleanup for dynamic read_all and write_all
Specifically:
 - reduce the number of realloc's and malloc's by moving
   some arrays out of the cycle loop, if we know that there
   size is not changing
 - store the rank of the aggregator in a separate variable to avoid
   continuous dereferencing
 - change the wait_all logic in write_all to use a fix number of requests
   (even if they are MPI_REQUEST_NULL)
 - fix the timing to considere the two initial allgather and the one
   allgatherv operation to be a part of it
 - add more comments.
2015-09-01 09:29:07 -05:00
Edgar Gabriel
cf1e4e0d35 step 0: clean up indenting and space vs. tabs 2015-09-01 09:29:07 -05:00
Gilles Gouaillardet
21642a2407 osc: do not cast way the const modifier when this is not necessary
update the osc framework and mpi c bindings
2015-08-31 10:34:05 +09:00
Gilles Gouaillardet
21b1e7f8c5 mpi conformance: fix prototypes
- MPI_Compare_and_swap
- MPI_Fetch_and_op
- MPI_Raccumulate
- MPI_Win_detach

Thanks to Michael Knobloch and Takahiro Kawashima for bringing this
to our attention
2015-08-31 10:34:05 +09:00
Ralph Castain
cf6137b530 Integrate PMIx 1.0 with OMPI.
Bring Slurm PMI-1 component online
Bring the s2 component online

Little cleanup - let the various PMIx modules set the process name during init, and then just raise it up to the ORTE level. Required as the different PMI environments all pass the jobid in different ways.

Bring the OMPI pubsub/pmi component online

Get comm_spawn working again

Ensure we always provide a cpuset, even if it is NULL

pmix/cray: adjust cray pmix component for pmix

Make changes so cray pmix can work within the integrated
ompi/pmix framework.

Bring singletons back online. Implement the comm_spawn operation using pmix - not tested yet

Cleanup comm_spawn - procs now starting, error in connect_accept

Complete integration
2015-08-29 16:04:10 -07:00
Edgar Gabriel
f214ccf499 fix the merge algorithm in the individual sharedfp component, which could
lead to file inconsistency in case of identical timestamps
Also fixes a potential buffer size problem.
2015-08-26 11:22:54 -05:00
Edgar Gabriel
423114e168 minor formatting fix. 2015-08-26 11:20:46 -05:00
Nathan Hjelm
f451876058 Merge pull request #825 from hjelmn/white_space_purge
periodic trailing whitespace purge
2015-08-25 19:23:52 -06:00
Todd Kordenbrock
25c48b96bb Merge pull request #819 from tkordenbrock/allow-atomics-upto-max_fetch_atomic_size
osc-portals4: allow atomic ops on datatypes that are max_fetch_atomic_size bytes in length
2015-08-25 09:25:27 -05:00
Edgar Gabriel
70078175ee fix coverty warning 72107 2015-08-25 09:23:37 -05:00
Edgar Gabriel
a73f9470e0 fix coverty warning 1269829 2015-08-25 09:22:48 -05:00
Edgar Gabriel
6f2e8d2073 last nights coverty fix introduced a new coverty complain. This commit tries to fix the new complain by coverty. 2015-08-25 08:46:38 -05:00
Edgar Gabriel
db2d37ad93 correctly free some arrays in case of an error. This fixes a whole bunch of coverty warnings. 2015-08-24 14:13:37 -05:00
Nathan Hjelm
156ce6af21 periodic whitespace purge
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-08-24 09:32:33 -06:00
Edgar Gabriel
58bd0c76b8 fix coverty warning CID 1317091 (properly freeing variables in case of an error) 2015-08-24 08:40:10 -05:00
--quiet
1e9227765a ofi mtl: also link in mtl_ofi_LIBS in the static case 2015-08-20 10:40:46 -07:00
Edgar Gabriel
4be20b119f bring the addproc component up to date with the fileview changes 2015-08-20 09:30:58 -05:00
Edgar Gabriel
8b84da5e35 bring the lockedfile component up to date with the fileview changes. 2015-08-20 09:26:30 -05:00
Edgar Gabriel
b0461f8d3c the back pointer from the ompio_file structure to the ompi_file_t structure
has to be set earlier in case the user disables the lazy_open option.
2015-08-19 17:11:42 -05:00
Edgar Gabriel
096fe78d73 the offset provided to the read_at/write_at routines has to be a multiple of the etype. 2015-08-19 17:11:42 -05:00
Edgar Gabriel
7e370948c1 first cut on the fileview for shared filepointers fix. 2015-08-19 17:11:42 -05:00
yohann
bcc10fbcd4 mtl/ofi: remove redundant code. 2015-08-19 13:13:59 -07:00
Yossi Itigin
f9e2ede47f Merge pull request #816 from yosefe/topic/yalla-fix-on-demand-map
yalla: fix passing on-demand mapping config to mxm.
2015-08-19 17:25:30 +03:00
Gilles Gouaillardet
646b9943e8 topo/treematch: initialize the global_bl symbol 2015-08-19 10:39:17 +09:00
Edgar Gabriel
1b45712595 bring the addproc component up to date with support for split collectives. No pr required
for this commit, since the addproc component is not part of v2.x
2015-08-18 12:17:46 -05:00
Todd Kordenbrock
10cf64373a osc-portals4: allow atomic ops on datatypes that are max_fetch_atomic_size bytes in length
Portals4 supports atomic ops on datatypes less than or equal to
max_fetch_atomic_size bytes.  This commit fixes a bug that required
the datatype to be less than max_fetch_atomic_size bytes.
2015-08-18 11:51:16 -05:00
Nathan Hjelm
145bac088d Merge pull request #753 from hjelmn/verbose_standard
Standardize verbosity levels
2015-08-18 09:43:28 -06:00
yosefe
85580ad055 yalla: fix passing on-demand mapping config to mxm. 2015-08-18 15:00:59 +03:00
Edgar Gabriel
5ef0632f9d cleanup the usage of printf vs. opal_output 2015-08-17 14:55:12 -05:00
Nathan Hjelm
2f447b2c4c bml/r2: use the bml framework output and set verbosity level to info
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-08-17 11:48:06 -06:00
yohann
98b300e1bb mtl/ofi: Require proper ordering by OFI provider. 2015-08-14 16:36:10 -07:00
Edgar Gabriel
072b18e197 Code cleanup for the time breakdown feature in ompio/fcoll
- make the internal structure follow the Open MPI naming convention
 - provide a single flag/macro which controls the compilation/utilization of this
   feature, to avoid that somebody using this has to modify every single
   fcoll component. A configure option could be added later if desired.
2015-08-14 08:53:04 -05:00
Edgar Gabriel
4bfc6ae798 Performance tuning: incorporate the usage of non-blocking operations in our array group-communication operations. 2015-08-13 20:05:18 -05:00
Gilles Gouaillardet
6118236f1a Merge pull request #796 from ggouaillardet/topic/hcoll_config
configury: fix hcoll, fca and mxm detection and revamp yalla Makefile.am
Thanks to David Shrader and Ake Sandgren for bringing this issue to our attention
2015-08-14 08:55:46 +09:00
Edgar Gabriel
9f369ba515 move the inclusion of the lustre_user and lliblustreapi header files to the fs_lustre.h file. 2015-08-13 15:36:16 -05:00
Gilles Gouaillardet
6b2fe9120e yalla: fix Makefile.am LDFLAGS 2015-08-13 17:33:52 +09:00
Gilles Gouaillardet
1a238d3a4f configury: fix fca detection
* do not add -I/.../include/fca -I /.../include/fca_core to CPPFLAGS
 * allow configure --with-fca
 * search fca libs in both DIR/lib and DIR/lib64
 * fix the description of the --with-fca option
2015-08-13 11:09:15 +09:00
Gilles Gouaillardet
df98a73131 configury: fix hcoll detection
* do not add -I/.../include/hcoll -I /.../include/hcoll/api to CPPFLAGS
 * allow configure --with-hcoll
 * search hcoll libs in both DIR/lib and DIR/lib64
 * fix the description of the --with-hcoll option
2015-08-13 11:08:56 +09:00
yohann
27520b99b8 mtl/ofi: add include/exclude list MCA vars.
mtl_ofi_provider_include (resp. mtl_ofi_provider_exclude) can be used
to specify which provider(s) the OFI MTL can select (resp. ignore).

e.g. --mca mtl_ofi_provider_include "psm,sockets"

By default, mtl_ofi_provider_exclude is set to "sockets,mxm".

This deprecates the old MCA var named "mtl_ofi_provider".
2015-08-12 13:52:04 -07:00
Jeff Squyres
e9b7203ece treematch: ensure hwloc support is enabled
This commit does the following:

* s/ompi_check_treematch/ompi_topo_treematch/ (i.e., abide by the
  prefix rule)
* change the value of ompi_topo_treematch_happy from yes/no to 0/1, so
  that we can use -eq for numerical comparisons (vs. string
  comparisons).  It's the little things in life, no?
* Check the valueo f $OPAL_HAVE_HWLOC to ensure that hwloc support is
  enabled.  If not, disqualify treematch from building.
* Fixes a few places that were underquoted
* Convert from "test ... -a ..." to "test ... && test ..."

Fixes open-mpi/ompi#797
2015-08-12 12:23:12 -07:00
Edgar Gabriel
55f0e1a1f8 fix the lustre compilation problems for older lustre versions. Add the prototype for the static function to avoid a warning message. 2015-08-12 09:45:07 -05:00
Jeff Squyres
3be125afff op base: whitespace cleanup
No logical code changes.
2015-08-12 05:35:11 -07:00
Jeff Squyres
a2addbafed op base: move return statement to correct level
This fixes CID 71945.
2015-08-12 05:35:11 -07:00
Nathan Hjelm
624a4a0f82 Merge pull request #699 from hjelmn/libnbc_fixes
coll/libnbc: rewrite parts of libnbc
2015-08-10 14:51:42 -06:00
Jeff Squyres
87db836800 Merge pull request #788 from yburette/topic/deprioritize_some_providers
mtl/ofi: Deprioritize some OFI providers.
2015-08-10 14:45:59 -04:00
Nathan Hjelm
d42e0968b1 coll/libnbc: rewrite parts of libnbc
This commit rewrites parts of libnbc to fix issues identified by
coverity and myself. The changes are as follows:

 - libnbc function would return invalid error codes (internal to
   libnbc) to the mpi layer. These codes names are of the form
   NBC_. They do not match up with the error codes expected by the mpi
   layer. I purged the use of all these error codes with the exception
   of NBC_OK and NBC_CONTINUE in progress. These codes are used to
   identify when a request handle is complete.

 - Handles and schedules were leaked by all collective routines on
   error. A new routine was added to return a collective handle
   (NBC_Return_handle).

 - Temporary buffers containting in/out neighbors for neighborhood
   collectives were always leaked.

 - Neigborhood collectives contained code to handle MPI_IN_PLACE which
   is never a valid input for the send or receive buffer. Stipped this
   code out.

 - Files were inconsistently named. Most are nbc_isomething.c but one
   was named coll_libnbc_ireduce_scatter_block.c.

 - Made the NBC_Schedule "structure" and object so it can be
   retained/released. This may enable the use of schedule caching at a
   later time. More testing will be needed to ensure the caching code
   works. If it doesn't the code should be stripped out completely.

 - Added code to simply common case of scheduling send/recv +
   barrier.

 - Code cleanup for readability.

The code now passes the clang static analyzer.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-08-10 11:53:25 -06:00
George Bosilca
0a91d7af4d Fix issues identified by Coverity. 2015-08-08 16:41:30 -04:00
Jeff Squyres
bd5bf4a224 Merge pull request #781 from hppritcha/topic/suppress_picky_warning
mca/topo: suppress picky warning
2015-08-08 06:14:52 -04:00
yohann
88038b5261 mtl/ofi: Deprioritize some OFI providers.
Some OFI providers such as "sockets" are used for debugging
purposes mostly. For these providers, other components usually
offer better performance -- e.g. for sockets, the BTL/TCP would
be a better choice.
Thus, we chose to ignore some providers unless explicitly asked
by the user on the command line:

e.g. --mca mtl_ofi_provider sockets
2015-08-07 16:09:51 -07:00
Edgar Gabriel
d719497f82 Performance tuning: increase the priority of the sm sharedfp component to ensure that it is selected if it can run. 2015-08-07 16:32:53 -05:00
Edgar Gabriel
9e29edf15c remove a erroneous paranthesis which prevents the compilation of the lustre adio 2015-08-07 15:22:41 -05:00
Edgar Gabriel
1293d9c69b free memory correctly in case of an error. Fixes CID 131540 and CID 1315419 2015-08-07 13:30:50 -05:00
Edgar Gabriel
0aa3049bfc Performance tuning: change the default behavior of ompio to *not* segment individual read/write operations.
In most cases, performance seems to be better if not segmented.
2015-08-07 13:06:39 -05:00
Edgar Gabriel
db5af26de7 Performance tuning. make sure we catch if the user wants to set the default fileview and replace it with our optimized default file view. Otherwise, performance will suffer. file_get_view should still return the correct filetype, not our optimized default file view. This is the correct version compared to ffa67b9693, which unfortunately broke
some test cases in mpi_test_suite. Thanks for @ggouaillardet for reporting this!
2015-08-07 12:49:58 -05:00
Edgar Gabriel
6f6c01ee8d free the datatypes that were created using type_dup during file_set_view 2015-08-07 11:50:25 -05:00
Edgar Gabriel
1ae4f8c7e6 Revert "Performance tuning. make sure we catch if the user wants to set the default fileview and replace it with"
This reverts commit ffa67b9693.
2015-08-07 09:53:07 -05:00
Gilles Gouaillardet
907c095f66 Merge pull request #779 from edgargabriel/topic/fcoll_fixes
Topic/fcoll fixes
2015-08-07 09:14:31 +09:00
Howard Pritchard
10aac8037f mca/topo: suppress picky warning
When configured with --enable-picky

topo_base_lazy_init.c compiles with a warning:

  CC       base/topo_base_lazy_init.lo
base/topo_base_lazy_init.c:46:67: warning: implicit conversion from enumeration type 'enum mca_base_register_flag_t' to different enumeration type 'mca_base_open_flag_t' (aka 'enum mca_base_open_flag_t') [-Wenum-conversion]
        err = mca_base_framework_open (&ompi_topo_base_framework, MCA_BASE_REGISTER_DEFAULT);

This commit fixes this implicit conversion problem.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-08-05 16:11:04 -06:00
Edgar Gabriel
16d4171f6b the individual component should call internal ompio functions directly. The reason is that otherwise
the redirection to the ompi_file_t structure (and back to the ompio internal structure) is ambiguise and wrong
for the shared file pointer scenario.
2015-08-05 14:31:11 -05:00
Edgar Gabriel
02a4eb2f13 add the ompi_file_t pointer correctly on the ompio file handle for the sm and individual component. 2015-08-05 14:28:27 -05:00
Jeff Squyres
a36d7e6026 treematch: __FUNCTION__ -> __func__ fixes 2015-08-05 05:39:38 -07:00
Jeff Squyres
a0ebbee6ef libnbc: __FUNCTION__ -> __func__ fixes 2015-08-05 05:27:23 -07:00
Gilles Gouaillardet
3d1780f1a2 sharedfp: set f_fh when opening a shared file 2015-08-05 15:07:21 +09:00
Jeff Squyres
047eccef8d Merge pull request #725 from bosilca/treematch
Add a new topo module: Treematch
2015-07-31 15:17:54 -04:00
Howard Pritchard
8649a9f6ef Merge pull request #757 from roblatham00/lustre-excl-open-fix
hint processing should not open files
2015-07-31 12:16:14 -06:00
rhc54
a9b10cfbf0 Merge pull request #761 from jithinjosepkl/master
Fix warnings in direct (pml-cm,mtl-ofi) build
2015-07-31 09:15:30 -07:00
Edgar Gabriel
ffa67b9693 Performance tuning. make sure we catch if the user wants to set the default fileview and replace it with
our optimized default file view. Otherwise, performance will suffer. file_get_view should still return the correct filetype, not our optimized default file view
2015-07-30 19:15:00 -05:00
Edgar Gabriel
93a303ba89 Performance tuning: make sure the individual component is selected for 1 and 2 process communicators (important for some benchmarks) 2015-07-30 17:31:16 -05:00
Edgar Gabriel
9b2a7e41f0 make sure the final number of aggregators is recorded correctly when not using
our aggregator selection logic.
2015-07-30 17:24:01 -05:00
Rob Latham
6e9cbe397f hint processing should not open files
move opening of files from hint processing and into open routines.

This is MPICH commit 92f1c69f0de8 and 22a77dceda11

see https://trac.mpich.org/projects/mpich/ticket/2261
Ref: https://github.com/open-mpi/ompi/issues/158

Signed-off-by: Pavan Balaji <balaji@anl.gov>
2015-07-30 12:25:20 -05:00
Jithin Jose
bc4e8b7e73 Fix warnings in direct (pml-cm,mtl-ofi) build
Signed-off-by: Jithin Jose <jithin.jose@intel.com>
2015-07-29 15:49:37 -07:00
Edgar Gabriel
477083bca3 the memory chunk that has to be allocated for the llapi_get_stripe function seems to have changed compared to earlier version. This implementation now follows the code snipplet from the man pages. 2015-07-29 17:13:55 -05:00
Edgar Gabriel
217dcca853 - the memory chunk that has to be allocated for the llapi_get_stripe function seems to have changed compared to earlier version. This implementation now follows the code snipplet from the man pages.
- implementation of file_get_size and set_size
2015-07-29 17:10:39 -05:00
yohann
6eba52a121 mtl/ofi: add missing return. 2015-07-29 14:14:34 -07:00
Ralph Castain
023936e84b Silence coverity warnings 2015-07-29 07:28:08 -07:00
Edgar Gabriel
a3327fe299 Merge pull request #756 from edgargabriel/pr/nb-sharedfp-splitcoll2
- make the split collective shared file pointer operations work
2015-07-28 19:53:27 -05:00
Edgar Gabriel
3780089ce0 clean up the usage of opal_output vs. printf 2015-07-28 18:27:31 -05:00
Howard Pritchard
377bad18bd Merge pull request #747 from hppritcha/topic/ofi_progress_fix
mtl/ofi: don't inline ofi progress method
2015-07-28 09:42:01 -06:00
Edgar Gabriel
824d488709 - make the split collective shared file pointer operations work
- minor code restructering in io/ompio required for that.
2015-07-28 09:05:05 -05:00
Edgar Gabriel
e380f8c235 - fix the delete priority of the ompio component
- some application use MPI_File_delete as a collective function (e.g. IOR), which I think is not really covered by the standard. Right now, one process succeeds and theother ones return an error code. Fix that by not returning no error if the file that we try to delete does not exist anymore, to make these applications work.
2015-07-27 15:53:40 -05:00
Edgar Gabriel
3fb0614566 mark the request as ACTIVE 2015-07-27 12:43:45 -05:00
Edgar Gabriel
5e166c81a1 Merge pull request #745 from edgargabriel/pr/sharedfp-sm-logic3
Pr/sharedfp sm logic3
2015-07-27 12:04:53 -05:00
Howard Pritchard
f5c43c1185 mtl/ofi: retain inline progress function
Retain inline progress function for ofi
mtl, but have a non-inlined progress function
which is registered with the opal progress
mechanism.

 @jithinjosepkl

I've bad news about the psm provider.  I still notice
segfaults - not always - but frequently at finalize
when using the psm provider.  I don't notice this
when using the sockets provider.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-07-27 09:16:52 -06:00
Gilles Gouaillardet
318a1a40a4 coll/libnbc: ireduce_scatter_block
silence malloc(0) warning reported by Lisandro
2015-07-27 16:23:08 +09:00
George Bosilca
e239de581b Create a new topology framework using the TreeMatch library developped
at Inria Bordeaux. This allows us to take advantage of the remap
capability of MPI to rearrange the ranks beased on the weights
povided by the application.

Fix the indentation and protect with __DEBUG__ one fprintf.

Add the Cecill-B license to the imported library.

Fix a compiler warning.

Restrict the TreeMatch dependencies.

The TreeMatch software is released under BSD3 (as indicated by their
copyright information @
https://gforge.inria.fr/scm/viewvc.php/COPYING?view=markup&root=treematch).

Update the README.
2015-07-25 13:30:42 -04:00
Jeff Squyres
3e6694f7ea sharedfp: whitespace cleanup
No code changes.

Replace tabs with spaces and do other whitespace cleanup (via emacs).
2015-07-25 05:46:37 -07:00
Jeff Squyres
868a84d4da sharedfp: have sm_data->mutex always point to the right mutex
Even if the mutex is actually located in
sm_data->sm_offset_ptr->mutex, have sm_data->mutex point to it.  This
avoids a few #if blocks that are otherwise identical.
2015-07-25 05:42:57 -07:00
Edgar Gabriel
4f85e0d833 add the configure logic to check for sem_open and sem_init.
Change the code to rely on HAVE_SEM_OPEN etc. instead of my internal macro.
2015-07-24 10:23:43 -05:00
Edgar Gabriel
d1d23054c6 rename the sm_offset structure to mca_sharedfp_sm_offset to obey to the Open MPI naming convention 2015-07-24 10:10:41 -05:00
Edgar Gabriel
c91cb67787 fix a bug in the unnamed semaphore section that was introduced when I tried to unify the named and unnamed semaphore logic. 2015-07-24 10:05:07 -05:00
Edgar Gabriel
57c301f25a remove an erroneous free statement. 2015-07-24 09:44:27 -05:00
Jeff Squyres
6929aca1b7 topo/basic: also remove .windows from Makefile.am 2015-07-22 09:20:43 -04:00
Jeff Squyres
24ca887bd8 topo/basic: remove stale (empty) .windows file 2015-07-22 09:10:50 -04:00
Edgar Gabriel
b484784dca make ompio return gracefully in case something goes wrong early in file_open. 2015-07-20 10:03:16 -05:00
Edgar Gabriel
86c3000e18 fix the delete selection logic in io/base. With the previous version, there was a mismatch
in the version number and no component was selected for file_delete.
2015-07-20 10:01:30 -05:00
Howard Pritchard
466c8b0159 Merge pull request #697 from edgargabriel/pr/nb-coll-part2
pr/nb collective I/O part2
2015-07-14 14:00:39 -06:00
Edgar Gabriel
e355db005e fix the logic for setting stripe size and stripe count in the lustre fs module. Takes now also the MPI_Info object into consideration. 2015-07-14 10:53:19 -05:00
Ralph Castain
683efcb850 Rename the current opal_event_base to opal_sync_event_base in preparation for adding an async progress thread to opal. No functional changes made here - just a simple rename. 2015-07-11 10:08:19 -07:00
Edgar Gabriel
f2af8e94ff - first cut on the io interface changes
- add the C interfaces for the new non-blocking collective I/O functions of MPI 3.1
2015-07-09 10:58:13 -05:00
yosefe
103cac5bd9 yalla: fix mxm configuration parsing.
Take configuration from MXM_MPI_xx instead of MXM_PML_xx, same as mtl
mxm.
2015-07-08 19:18:23 +03:00
Gilles Gouaillardet
9e89985f3d restore whitespaces into the pdf files 2015-07-07 09:17:00 +09:00