1
1
Граф коммитов

2724 Коммитов

Автор SHA1 Сообщение Дата
Nathan Hjelm
79881ca892 btl/vader: prevent double-destruction of endpoints and move endpoint teardown code into destructor
This commit was SVN r32779.
2014-09-23 21:51:15 +00:00
Nathan Hjelm
2d8fba0861 btl/vader: silence warning
This commit was SVN r32778.
2014-09-23 21:33:45 +00:00
Nathan Hjelm
8bd3160432 btl/vader: fix several typos in vader update
This commit was SVN r32775.
2014-09-23 20:25:36 +00:00
Nathan Hjelm
12bfd13150 btl/vader: improve performance for both single and multiple threads
This is a large update that does the following:

 - Only allocate fast boxes for a peer if a send count threshold
   has been reached (default: 16). This will greatly reduce the memory
   usage with large numbers of local peers.

 - Improve performance by limiting the number of fast boxes that can
   be allocated per peer (default: 32). This will reduce the amount
   of time spent polling for fast box messages.

 - Provide new MCA variables to configure the size, maximum count,
   and send count thresholds for fast boxes allocations.

 - Updated buffer design to increase the range of message sizes that
   can be sent with a fast box.

 - Add thread protection around fast box allocation (locks). When
   spin locks are available this should be updated to use spin locks.

 - Various fixes and cleanup.

This commit was SVN r32774.
2014-09-23 18:11:22 +00:00
Howard Pritchard
1508a01325 Fixes to enable mpirun to work again on Cray
The ess pmi module was not handling aprun launched
daemons.  All daemons were thinking they were vpid 1.

Also, turns out that on cray systems using MOM nodes
for launched jobs, just detecting whether or not a
process is in a PAGG container is not sufficient.

Crank up the priority of the alps PLM component in the
event that the configure detected the presence of both
slurm and alps.

Have the ESS pmi component open the pmix framework and
select a pmix component.

This commit was SVN r32773.
2014-09-23 15:37:26 +00:00
Artem Polyakov
f2e586980b Fix timing framework:
1. Fixes according to (http://www.open-mpi.org/community/lists/devel/2014/09/15869.php)
2. Force mpisync:rank0 to gather results. Now sync info is written by rank0 to the output file.
3. Improve mpirun_prof: 1) adopt to the environment (SLURM/TORQUE); 2) recognize some noteset-related mpirun options.

This commit was SVN r32772.
2014-09-23 12:59:54 +00:00
Ralph Castain
70896550bf Per input from Artem, update the copyrights on these files, ensuring to include all the licensing info for the files broght over from the mpiperf project.
This commit was SVN r32770.
2014-09-20 14:54:24 +00:00
Artem Polyakov
70587d1804 Remove outdated OPAL parameter "opal_pmi_version". Now PMI selection is handled by PMIx MCA.
This commit was SVN r32767.
2014-09-20 02:30:23 +00:00
Ralph Castain
dfb952fa78 [Contribution from Artem - moved it to svn from git for him]
Replace our old, clunky timing setup with a much nicer one that is only available if configured with --enable-timing. Add a tool for profiling clock differences between the nodes so you can get more precise timing measurements. I'll ask Artem to update the Github wiki with full instructions on how to use this setup.

This commit was SVN r32738.
2014-09-15 18:00:46 +00:00
Vasily Filipov
e26af91a64 BTL/OPENIB: set "max_lmc" param to be "1" and not "all available values" by default.
cmr=v1.8.3:reviewer=miked 

This commit was SVN r32736.
2014-09-15 13:56:41 +00:00
Alex Mikheev
31d0724a08 OMPI: btl openib: fix detection of max registarable memory
Deal with the case when mlx4 module is loaded but device
is not present

cmr=v1.8.3:reviewer=miked

This commit was SVN r32734.
2014-09-15 12:17:23 +00:00
Ralph Castain
fad4384463 Not sure how we could get to this point without having already detected the error, but just to be safe - check for end-of-array and return if error.
Refs trac:4897

This commit was SVN r32731.

The following Trac tickets were found above:
  Ticket 4897 --> https://svn.open-mpi.org/trac/ompi/ticket/4897
2014-09-13 02:23:30 +00:00
Jeff Squyres
66aeadacff opal_search_libs: correctly AC_DEFINE results of search
1. It is not sufficient to put the result of m4_toupper() in a
variable and use that variable as the variable name in
AC_DEFINE_UNQUOTED.  Instead, just use m4_toupper() directly in
AC_DEFINE_UNQUOTED.  Also, save the result value in a "permanent"
variable that isn't erased, just in case autoconf decides to be lazy
about instantiating the body AC_DEFINE_UNQUOTED and move it later
(this is probably overkill :-) ).
1. Use the OMPI Way of always defining macros (to 0 or 1).  Then also
slightly change the logic in util/basename.c to just check
OPAL_HAVE_DIRNAME (because it will always be defined).

Refs trac:4894

This commit was SVN r32723.

The following Trac tickets were found above:
  Ticket 4894 --> https://svn.open-mpi.org/trac/ompi/ticket/4894
2014-09-13 00:28:30 +00:00
Ralph Castain
7269dae2da Per patch from Samuel Thibault, silence warning from Clang
This commit was SVN r32720.
2014-09-12 22:22:11 +00:00
Ralph Castain
0445052a1c Check for multiple declarations of a given MCA param and error out if detected as that can create an ambiguous definition of the param value.
Refs trac:4897

This commit was SVN r32719.

The following Trac tickets were found above:
  Ticket 4897 --> https://svn.open-mpi.org/trac/ompi/ticket/4897
2014-09-12 22:21:30 +00:00
Jeff Squyres
d244b7b860 mca_base_var: fix possibilty of unaligned variable assignments
Add a debugging check that ensures that the registered storage is
aligned appropriately for the type that is specified.

When we know that the storage is properly aligned, we can cast the
mbv_storage to the appropriate type and then simply do the assignment.
We used to do this assignment via a union, but clang's
-fsanitizer=alignment complained about this.

This commit was SVN r32716.
2014-09-11 23:02:49 +00:00
Ralph Castain
1f2c5863f0 Revert r32675 in favor of a different solution proposed by Brice
This commit was SVN r32715.

The following SVN revision numbers were found above:
  r32675 --> open-mpi/ompi@916f98a3ee
2014-09-11 21:58:48 +00:00
Howard Pritchard
e43715574a remove ignored restrct return type qualifier
The use of restrict in the return type qualifier for mca_btl_vader_reserve_fbox
is being ignored by gnu compiler.  for newer gcc, one sees this warning only
with -Wignored-qualifiers set, but for older variants of gcc it was reported
that numerous warning messages about this ignored qualifier were being
generated as vader is being compiled.

The warning reported by gcc is

btl_vader_fbox.h:53:47: warning: type qualifiers ignored on function return type [-Wignored-qualifiers]
 static inline mca_btl_vader_fbox_t * restrict mca_btl_vader_reserve_fbox (struct mca_btl_base_endpoint_t *ep, const size_t size)

This commit was SVN r32714.
2014-09-11 21:12:41 +00:00
Rolf vandeVaart
9a2bab0e27 Add support for detecting CUDA managed memory. Disabled for now.
This commit was SVN r32713.
2014-09-11 21:07:17 +00:00
Howard Pritchard
820b34e5d2 Fix bad cut/paste for commit c19e7369
This commit was SVN r32712.
2014-09-11 21:00:04 +00:00
Howard Pritchard
d07c5674a3 Fix potential double free in cray pmi cray_fini
This commit was SVN r32711.
2014-09-11 20:30:40 +00:00
Ralph Castain
cb2ad98f57 Silence an unused function warning
This commit was SVN r32704.
2014-09-10 17:36:34 +00:00
Ralph Castain
a7c5b77d70 Just because the openib BTL can't reach a process doesn't mean it is a job-ending error. If we have other methods for reaching the process (e.g., sm for a local proc), then that's okay. If there is no method for reaching a proc, then that's an error - but the BML will report that situation.
The question of whether or not the openib BTL supports loopback is a separate question. It may be more appropriate to make the modex be PMIX_GLOBAL for cases where openib can support loopback so someone can run without a shared memory component. I'll leave that decision to the IB vendors.

This commit was SVN r32702.
2014-09-10 17:02:16 +00:00
Ralph Castain
93948f0c4e Resolve alignment issues when unpacking buffers
cmr=v1.8.3:reviewer=jsquyres

This commit was SVN r32698.
2014-09-10 10:19:16 +00:00
Ralph Castain
e671620ac7 Per request from Jeff: tune up the help messages for binding options
Refs trac:4898

This commit was SVN r32691.

The following Trac tickets were found above:
  Ticket 4898 --> https://svn.open-mpi.org/trac/ompi/ticket/4898
2014-09-09 22:39:22 +00:00
Ralph Castain
4207b4c4ad Improve the --bind-to help message to better indicate the default options under various values of np. Remove the warning message if the user doesn't specify a binding policy and we are overloaded
cmr=v1.8.3:reviewer=jsquyres

This commit was SVN r32687.
2014-09-08 21:03:51 +00:00
Ralph Castain
4df1aa63f7 Since we've run into the situation where someone puts a script wrapper around a launcher such as srun, we need to always protect MCA cmd line params with quotes. This means we also need to protect the backend from quotes coming into the system as part of a value, or else the parser gets confused.
So add a new function for wrapping MCA arguments, and tell the backend parser to ignore/remove leading/trailing quotes.

cmr=v1.8.3:reviewer=jsquyres

This commit was SVN r32686.
2014-09-08 20:38:46 +00:00
Ralph Castain
5649841e26 Provide missing include file - generates errors when used with Intel compilers
This commit was SVN r32685.
2014-09-08 19:04:40 +00:00
Ralph Castain
e32d541c8d Bring over a slight modification to the opal_init_test routine
This commit was SVN r32676.
2014-09-07 15:46:53 +00:00
Ralph Castain
916f98a3ee Rename an HWLOC member of a union in the diff.h file to avoid a naming conflict with an external library - it isn't that HWLOC did something wrong, but rather that the name being used is so close to a type name that other folks has a tendency to #define it as well. We could argue with those folks that what they are doing is incorrect, but it is just easier to make a slight change and resolve the problem.
This commit was SVN r32675.
2014-09-07 15:42:05 +00:00
Ralph Castain
6323b226c7 Bring over some updates from the PMIx branch - mostly just minor cleanups. Make the direct grpcomm component no longer be the default. For now, we seem to be having problems with non-blocking fence operations, so make them not be the default under any scenario (e.g., when sm is the only btl in operation).
This commit was SVN r32673.
2014-09-06 19:19:44 +00:00
Ralph Castain
f1a33b6476 Use the accessor function to get the jobid and vpid
This commit was SVN r32672.
2014-09-06 19:18:21 +00:00
Howard Pritchard
fe2ea1f0fb fix handling of OPAL_DSTORE_LOCALITY and ref cnt
This commit was SVN r32671.
2014-09-05 21:36:19 +00:00
Ralph Castain
ec51cbab9f We are failing to use the system dirname function because we are not correctly flagging that we found it. Modify opal_search_libs_core to set an "opal_have_foo" flag to indicate that we found the specified function, and then modify the have_dirname check to look for it.
cmr=v1.8.3:reviewer=jsquyres

This commit was SVN r32669.
2014-09-04 16:10:38 +00:00
Ralph Castain
41c6058153 Bring over changes to MXM from pmix branch:
MTL MXM: establish endpoint connection on the first communication when direct_modex used

This commit was SVN r32668.
2014-09-03 18:22:11 +00:00
Ralph Castain
a51d1d7a97 find_last_path_separator returns NULL if the filename doesn't contain a path separator in it - i.e., it's just a local file. So protect the loop to avoid a segfault
cmr=v1.8.3:reviewer=rolfv

This commit was SVN r32667.
2014-09-03 18:13:42 +00:00
Ralph Castain
3fed455bbc If something goes wrong in add_procs, let's not segfault during finalize
This commit was SVN r32665.
2014-09-03 17:27:31 +00:00
Ralph Castain
b372cd02d0 Ensure the hwloc headers get installed when --with-devel-headers is given
This commit was SVN r32663.
2014-09-02 19:58:25 +00:00
Ralph Castain
d13fb37ef9 Add array types to opal_value_t
This commit was SVN r32656.
2014-08-31 08:07:03 +00:00
Ralph Castain
9500939042 Fix abstraction violation
This commit was SVN r32655.
2014-08-31 08:06:35 +00:00
Ralph Castain
60eb7124ab Upgrade to hwloc 1.9.1
This commit was SVN r32652.
2014-08-31 03:13:06 +00:00
Ralph Castain
5cdbc00136 Re-enable the usock oob component. Ensure the TCP component promotes messages for other procs to the OOB base so that other components have a chance to send the relay. Seems to be passing MTT, so let's see how it works for others.
This commit was SVN r32650.
2014-08-30 19:33:46 +00:00
Ralph Castain
9ac75451ff Nathan had requested this before as he needs to know the #procs in the job to optimize the UGNI btl. Add the fetch for that data - the native pmix component already provides it, but ensure the Slurm PMI-1 support does too. If not found, fall back to the non-optimized number
This commit was SVN r32648.
2014-08-29 22:53:35 +00:00
Ralph Castain
f865ef61ab Need local_size returned by the Slurm components
This commit was SVN r32646.
2014-08-29 22:23:27 +00:00
Howard Pritchard
9a2891f2d6 handle PMIX_LOCAL_SIZE attr arg in cray pmix
This commit was SVN r32645.
2014-08-29 21:18:02 +00:00
Ralph Castain
8faabed2cd Add some further initialization and protection for zero-byte messages
This commit was SVN r32644.
2014-08-29 17:24:55 +00:00
Gilles Gouaillardet
6916bfc368 btl/openib: fix use of mca_btl_openib_component.default_recv_qps
- do not have mca_btl_openib_component.default_recv_qps point to the stack
- do not reset mca_btl_openib_component.default_recv_qps in btl_openib_component_open

cmr=v1.8.3:reviewer=miked

This commit was SVN r32642.
2014-08-29 04:41:34 +00:00
Gilles Gouaillardet
b8a2e90f2d btl/openib: fix a typo
cmr=v1.8.3:reviewer=miked

This commit was SVN r32639.
2014-08-29 04:23:42 +00:00
Ralph Castain
730e28349e Some minor uninitialized variable cleanups
This commit was SVN r32629.
2014-08-29 02:21:13 +00:00
Jeff Squyres
733316372b usnic: remove suggestion of enabling no-drop in the fabric
Reviewed by Reese Faucette

cmr=v1.8.3:reviewer=ompi-rm1.8

This commit was SVN r32628.
2014-08-28 23:56:56 +00:00