1
1
Граф коммитов

9738 Коммитов

Автор SHA1 Сообщение Дата
Thomas Naughton
86bb6f8bac
Merge pull request #4444 from naughtont3/tjn-fix-plm-monitoring-configury
configury: single quote to avoid trouble with BSD
2017-11-08 10:56:37 -05:00
Jeff Squyres
5ac1ea4512 ompi_info: add ipv6 config param
Allow ompi_info to show whether IPv6 support was enabled in Open MPI
or not.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-11-08 07:02:27 -08:00
Edgar Gabriel
0c3aa44ba3
Merge pull request #4454 from edgargabriel/topic/large-file-bug-fix
io/ompio: fix a bug in handling large write/read operations
2017-11-07 09:46:47 -06:00
Gilles Gouaillardet
18d3897479 fs/ufs: add some more error codes on file_open
set proper error codes in mca_fs_ufs_file_open by mapping the errno value to
the MPI error code.

Refs. open-mpi/ompi#4443

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-11-07 11:09:32 +09:00
Edgar Gabriel
c9bb049d00 io/ompio: fix a bug in handling large write/read operations
This is a bug fix based on a problem reported on the mailing list.
For very large read/write operations, ompio breaks the operation
down into multiple cycles. The problem was that
one of the variables required to maintain its values
across the different cycles did not do that, and because
of that the calculations of the memory offsets was wrong.

Fixes issue #4453

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2017-11-06 11:48:13 -06:00
Edgar Gabriel
91881c4fdc fs/lustre: update component to match fs/ufs
the fs/lustre component has missed out on a number of updates to the fs/ufs component.
This commit tries to import all the changes performed on the fs/ufs component
w.r.t to the file_open operation, including updates on how the amode is set,
error is propegated and setting the fs_block_size value (which is required for
locking purposes).

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2017-11-03 16:11:46 -05:00
Edgar Gabriel
1885d99ac7 fs/ufs: set proper error codes on file_open
set proper error codes in mca_fs_ufs_file_open by mapping the errno value to
the MPI error code. Fixes an issue reported on the mailing by Wei-keng Liao

Fixes Issue #4443

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2017-11-03 16:06:31 -05:00
Thomas Naughton
c5dc41ee1a configury: single quote to avoid trouble with BSD
Signed-off-by: Thomas Naughton <naughtont@ornl.gov>
2017-11-03 11:34:28 -04:00
bosilca
3e1f771045
Merge pull request #4403 from naughtont3/tjn-fix-plm-monitoring-configury
fix PML monitoring configury to compile DSOs
2017-11-02 19:08:20 -04:00
bosilca
63e8a8c608
Merge pull request #4431 from hjelmn/asm_cleanup
opal: rename opal_atomic_cmpset* to opal_atomic_bool_cmpset*
2017-11-02 18:45:56 -04:00
Matias Cabral
c8aa22ee22
Merge pull request #4304 from aravindksg/master
Fix OFI MTL to recognize correct CQ empty scenario and improve error reporting
2017-11-02 11:56:57 -07:00
George Bosilca
b2b3da3046
Do not access the frag after returning it.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-10-31 16:39:23 -04:00
Nathan Hjelm
3ff34af355 opal: rename opal_atomic_cmpset* to opal_atomic_bool_cmpset*
This commit renames the atomic compare-and-swap functions to indicate
the return value. This is in preperation for adding support for a
compare-and-swap that returns the old value. At the same time the
return type has been changed to bool.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-10-31 12:47:23 -06:00
Ralph Castain
27f3d417ca Revert the MPI_Init fence operations to use volatile bool instead of thread macros.
The problem is that the waiting thread is cycling using OMPI_LAZY_WAIT_FOR_COMPLETION so it can exercise opal_progress. This probably isn't as critical for the modex step, but definitely necessary for the barrier at the end of mpi_init. The problem this creates is that the lazy macro exits as soon as "active" becomes false, and then we destruct the lock.

However, wakeup_thread sets "active" to false - and then calls the condition broadcast to wakeup any waiting threads. So there is a race condition between that broadcast and the lock destruct.

Add OPAL_ACQUIRE_OBJECT and OPAL_POST_OBJECT memory barriers to help protect against thread race conditions on some platforms

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-10-31 08:09:02 -07:00
Aravind Gopalakrishnan
285fc42b4e Fix OFI MTL to recognize correct CQ empty scenario
Currently, the progress function is incorrectly interpreting any error
value other than a positive value or -FI_EAVAIL to mean CQ is empty.
CQ is empty only if fi_cq_read() call returned -EAGAIN error
code. Fix that here.

While at it, fix help text output for calls made to OFI API.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
2017-10-30 12:13:44 -07:00
Aravind Gopalakrishnan
bea4503f95 Move help text output regarding PSM2_CUDA envvar to component init phase
The messages should be printed only in the event of CUDA builds and in the
presence of supporting hardware and when PSM2 MTL has actually been selected
for use. To this end, move help text output to component init phase.

Also use opal_setenv/unsetenv() for safer setting, unsetting of the environment
variable and sanitize the help text message.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
2017-10-26 16:01:01 -07:00
Thomas Naughton
86d282d6dd fix PML monitoring configury to compile DSOs
Signed-off-by: Thomas Naughton <naughtont@ornl.gov>
2017-10-26 15:53:11 -04:00
Ralph Castain
cf3bc4f55b Merge pull request #4346 from matcabral/psm2_mtl_mq_thread_fix
MTL PSM2: add a thread lock while peeking and completing the psm2 requests.
2017-10-24 16:41:29 -05:00
Ralph Castain
0353be9704 Update MPI init to properly skip barriers
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-10-23 19:28:34 -07:00
Ralph Castain
6ea3c8a0bd Update the interlib example to show an alternative method for model declaration. Add a missing range value to the OPAL layer. Make it easier to see OMPI model callbacks
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-10-23 11:27:42 -07:00
Matias Cabral
b81bcd4b0d MTL PSM2: add a thread lock while peeking and completing the psm2
requests.
Reviewed-by: Gopalakrishnan, Aravind <aravind.gopalakrishnan@intel.com>
Signed-off-by: Matias Cabral <matias.a.cabral@intel.com>
2017-10-20 14:46:48 -07:00
Edgar Gabriel
be0de21e6f fs/ufs and fbtl/posix: cleanup lock management
This commit looks large, but its really mostly a cleanup step.
1. introduce proper error handling for the return values of fcntl and the fbtl_posix_lock function
2. rename a parameter to more accurately reflect what it does
3. introduce an mca parameter in the fs/ufs component that allows to control
   what the level of locking the user would like to enforce
4. move the initialization of the fs_block_size parameter from fs/ufs into the
   common/ompio component. An fs component might be allowed to overwrite this
   value, but none of the actual fs components do that.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2017-10-19 14:56:28 -05:00
Edgar Gabriel
e62f9d2e52 fs/ufs: ensure that the never-lock flag is set if not on NFS
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2017-10-19 13:32:40 -05:00
Edgar Gabriel
f66c55f77a fbtl/posix: fixes in the offset calculation and for aio operations
our own internal testsuite passes now correctly. More testing to follow.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2017-10-19 13:32:39 -05:00
Edgar Gabriel
a3c638bc38 fbtl/posix: add support for file locking for the non-blocking operations
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2017-10-19 13:32:38 -05:00
Edgar Gabriel
415e76514d fbtl/posix: make the code compile
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2017-10-19 13:32:37 -05:00
Edgar Gabriel
f5e158c869 fbtl/posix: first cut in adding locking support
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2017-10-19 13:32:37 -05:00
Gilles Gouaillardet
9771c575f5 Merge pull request #4352 from edgargabriel/pr/sem_close_fix
sharedfp/sm: close the named semaphore
2017-10-19 17:04:43 +09:00
Edgar Gabriel
4d995bd4eb sharedfp/sm: close the named semaphore
in case a named semaphore is used, it is necessary to close the semaphore to remove
all sm segments. sem_unlink just removes the name references once all proceeses have closed
the sem.

Fixes issue: #4336

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>

sharedfp/sm: unlink only needs to be called by one process

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2017-10-18 10:37:30 -05:00
Aurelien Bouteiller
3ef23f41a3
Bugfix a crash when a comm cannot be initialized
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
2017-10-18 11:32:37 -04:00
Valentin Petrov
1e311b2619 coll/hcoll: dtype fallback optimization
If hcoll fails to create mpi derived type let's set zero_dte on this dtype.
    This will save cycles on subsequent collective calls with the same derived
    type since we will not try to create hcoll type again.

Signed-off-by: Valentin Petrov <valentinp@mellanox.com>
2017-10-06 10:29:29 +03:00
Valentin Petrov
06ef344630 coll/hcoll: extends dtypes support
Adds support for legacy MPI_UB/LB types (old apps may use it) as
    well as for BOOL/WCHAR.

Signed-off-by: Valentin Petrov <valentinp@mellanox.com>
2017-10-06 10:29:29 +03:00
Geoff Paulsen
be7b0af5d9 Merge pull request #3609 from markalle/pr/single_type_with_LB_UB
single_predefined_type with MPI_LB/UB
2017-10-04 15:13:09 -05:00
Mark Allen
e24d5ccb7e single_predefined_type with MPI_LB/UB
The ompi_datatype_get_single_predefined_type_from_args() recurses down
into a constructed type to identify what base datatype it's built from
if it's built from a single type.  But if the type has MPI_LB/MPI_UB,
for example
    lens[0] = 1;
    lens[1] = 1;
    disps[0] = 0;
    disps[1] = 0;
    types[0] = MPI_LB;
    types[1] = MPI_INT;
    MPI_Type_create_struct(2, lens, disps, types, &mydt);
then this function will see the base type MPI_LB as differing from MPI_INT
and will identify mydt as not being constructed from a single base type, so
the type will be rejected for calls like MPI_Accumulate.

I think those "meta data" types shouldn't result in rejection like that, and
the above mydt should still be identified as having a single base type
of MPI_INT.

Addition: boslica wanted another change discussed here
    https://github.com/open-mpi/ompi/pull/3609
relating to the calculation for "count" after identifying the
predefined_type that was being used.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
2017-10-03 19:08:18 -04:00
George Bosilca
bdbea63a1c
Update the MPI standard reference.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-10-03 16:48:50 -04:00
George Bosilca
a3ac67be0d
Remove double include.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-10-02 21:33:40 -04:00
Gilles Gouaillardet
9492766dbd romio: disable Fortran support
romio314 is a just a component that does not require Fortran bindings,
so simply disable Fortran support to prevent warnings about deprecated flags

Fixes open-mpi/ompi#4281

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-10-02 21:44:33 +09:00
Matias Cabral
d9b2c94d4a Merge pull request #4286 from aravindksg/master
Use opal_show_help to warn about PSM2_CUDA envvar setting
2017-10-01 11:10:19 -07:00
George Bosilca
2a2db13b32
Gracefully deal with a get returning 1 (complete right away).
Kudos to @EmmanuelBRELLE for spotting it.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-10-01 02:24:02 -04:00
Aravind Gopalakrishnan
f8a2b7f6bf Use opal_show_help to warn about PSM2_CUDA envvar setting
If Open MPI is configured with CUDA, then user also should be using a CUDA build of
PSM2 and therefore be setting PSM2_CUDA environment variable to 1 while using
CUDA buffers for transfers. If we detect this setting to be missing, force set
it. If user wants to use this build for regular (Host buffer) transfers, we
allow the option of setting PSM2_CUDA=0, but print a warning
message to user that it is not a recommended usage scenario.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
2017-09-29 17:04:10 -07:00
George Bosilca
92e4ecb618
Fix the offset computation.
Also ensure the marked array is correctly freed in all cases.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-09-27 11:42:21 -04:00
George Bosilca
0ceed71368
Fix Coverity warnings in treematch topo.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-09-26 22:24:01 -04:00
Edgar Gabriel
4c0d347412 Merge pull request #4230 from edgargabriel/topic/no-smart-fview
io/ompio: add a new grouping option avoiding communication
2017-09-26 10:56:06 -05:00
bosilca
f44e674992 Merge pull request #4074 from bosilca/topic/coverity
Fix coverity complaints.
2017-09-25 15:05:31 -04:00
Guillaume Mercier
4e7c130c31
Add correct reordering computation in partially distributed case.
Replaced matching array with k and bcast with scatter.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Signed-off-by: Guillaume Mercier <mercier@labri.fr>
2017-09-25 13:10:11 -04:00
George Bosilca
3dd1d8cb53
Delay the first check for the HWLOC topology.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-09-25 13:09:57 -04:00
George Bosilca
28046b37df
Always succesfully return.
As the reordering is an optional step, if any operation during the
reorder fails we can return the duplicata of the original communicator
associated with the topology information.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-09-25 12:53:20 -04:00
George Bosilca
219a96fa69
Prevent memory leaks.
Reorder the code to simplify the memory management.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-09-25 12:53:20 -04:00
George Bosilca
64bff0e326
Disable monitoring if we compile statically.
Protect all components against compilation on static builds.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-09-25 12:18:23 -04:00
George Bosilca
458ccc12e1
Move the profiling library in common/monitoring
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-09-25 12:18:23 -04:00