1
1
Граф коммитов

20485 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
ab52f16100 Attempt to cleanup the race condition Rolf keeps encountering in MTT by adding some protection to ensure orted's try to terminate once their local procs die. Also, fix a problem whereby a failure to comm_spawn would result in a hang of the parent process.
cmr=v1.8.2:reviewer=rhc:subject=cache termination cleanups

This commit was SVN r32008.
2014-06-16 20:46:35 +00:00
George Bosilca
84193fff6d More comprehensible error messages.
This commit was SVN r32007.
2014-06-16 20:23:16 +00:00
Ralph Castain
561983ae52 Fix static builds by renaming conflicting type
This commit was SVN r32006.
2014-06-14 17:39:28 +00:00
Ralph Castain
3f04d50cb0 Per the ticket, resolve our handling of overload conditions to provide a more consistent response. If we are overloaded (i.e., attempting to bind more processes to a location than the number of cpus under that location), then we consider the following conditions:
(a) default binding policy is in effect. In this case, we will emit a
warning and default to not binding unless the user provided the
"oversubscribe" or "overload" modifier to the "bind-to" option.

(b) user-specified binding policy is in effect. In this case, we will
error out unless the user provided the "oversubscribe" or "overload"
modifier to the "bind-to" option as we cannot meet the directive.

Either "bind-to" modifier (oversubscribe or overload) will be accepted for
now - in 1.9, we will deprecate the "overload" term in favor of
"oversubscribe".

Also added the ability to accept a --bind-to modifier without specifying the binding policy itself so a user can specify overload-allowed with the default policy.

Closes trac:4345

cmr=v1.8.2:reviewer=rhc:subject=resolve handling of overload conditions

This commit was SVN r32005.

The following Trac tickets were found above:
  Ticket 4345 --> https://svn.open-mpi.org/trac/ompi/ticket/4345
2014-06-14 15:38:32 +00:00
Ralph Castain
811f3d0665 Remove unused vars and actually test the indexed array member in the bitmap class
This commit was SVN r32004.
2014-06-14 03:18:24 +00:00
George Bosilca
13252206ad Correctly handle the case where we restrict the size to INT_MAX.
This commit was SVN r32003.
2014-06-13 21:32:46 +00:00
George Bosilca
fbe69808f2 A faster implementation of the OPAL_BITMAP. The corresponding
test has also been updated.

This commit was SVN r32001.
2014-06-13 21:15:35 +00:00
Ralph Castain
56c3575c0e Can't emit an error for an unrecognized mapping policy modifier as the ppr policy relies on not doing so.
This commit was SVN r31998.
2014-06-13 20:10:09 +00:00
Ralph Castain
3ed282bf44 Per patch from Tetsuya, correct the cpus-per-proc logic so we correctly detect when the user is attempting to bind too low for that option
Refs trac:4702

This commit was SVN r31988.

The following Trac tickets were found above:
  Ticket 4702 --> https://svn.open-mpi.org/trac/ompi/ticket/4702
2014-06-13 16:32:52 +00:00
George Bosilca
542e4996a7 Cleanup the utilities functions in tuned.
This commit was SVN r31987.
2014-06-13 16:04:45 +00:00
Oscar Vega-Gisbert
0b856316f8 java: add the oshmem Java examples into the examples/Makefile
This commit was SVN r31986.
2014-06-13 06:54:11 +00:00
Gilles Gouaillardet
50256c62c5 Fix MPI_Alltoallv in coll/tuned.
Correctly handle the corner case in MPI_Alltoallv when
some tasks have no data to transfer and some other tasks
do have data to transfer.

This test case is covered in ibm/collective/alltoallv_somezeros
from the ompi-tests repo.

cmr=v1.8.2:reviewer=bosilca

This commit was SVN r31985.
2014-06-13 06:03:23 +00:00
Mike Dubman
b51a42aeca MXM: fix mxm cleanup, should be called for any compat API
fixe by miked, reviewed by yossi

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31984.
2014-06-12 15:46:38 +00:00
Oscar Vega-Gisbert
cdf0bcc6d1 Java - define macro OSHMEM_SETUP_JAVA_BINDINGS
This commit was SVN r31983.
2014-06-11 21:49:31 +00:00
George Bosilca
fd0e1b7261 If we detect an error on a request that has been already released
at the MPI level, we should call abort on MPI_COMM_WORLD.

Fixes ticket #1943.
cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31982.
2014-06-10 16:24:13 +00:00
Ralph Castain
ba926d8635 The TCP component will have set the hash table entry to NULL, but that doesn't remove the key. So the hash_table retrieval function will return success, but with a NULL pointer - protect against that scenario
Patch provided by Gilles - reviewed by rhc.

RM-approved

cmr=v1.8.2:reviewer=ompi-gk1.8

This commit was SVN r31971.
2014-06-09 17:46:22 +00:00
Ralph Castain
5d5ae41ea5 Cleanup a memory leak in the daemons - thanks to Artem for spotting it
This commit was SVN r31970.
2014-06-09 17:14:02 +00:00
Alex Mikheev
3b5fa97790 OSHMEM: fixes problem with local heap2heap copy
check for possibility of heap2heap copy was incorrect
in case when shared heaps have different virtual
addresses on same host.

It seems that ibv_exp_reg_mr() on CIB cards may return
different VAs for heap on same node. On CX3 addresses are
the same.

reviewed by miked

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31969.
2014-06-09 09:41:44 +00:00
Alina Sklarevich
7b8ad47e93 MXM: fix env variable name to hint for thread usage in mxm
reviewed by MikeD
cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31968.
2014-06-09 06:40:32 +00:00
Ralph Castain
06dbfa3098 Make the cpus-per-proc equivalent a little more intuitive:
* allow users to specify just a modifier for map-by instead of requiring that they also specify a policy. Thus, we now accept --map-by :pe=3 as indicating that we should use the default mapping policy, but bind 3 cpus/proc.

* if users specify a pe's/proc but no policy, default to --map-by NUMA to ensure we have access to multiple cpus for the request. This won't guarantee we have access to enough to meet the request, but gives us a chance. In addition, we know that binding a proc to multiple cpus will work best if those cpus are all in the same NUMA, so this provides some degree of optimized behavior.

Per a request from Jeff, define "oversubscribe" for binding as a synonym for the "overload" modifier.

cmr=v1.8.2:reviewer=rhc

This commit was SVN r31967.
2014-06-08 20:26:59 +00:00
Ralph Castain
8db76e9c6f Ensure that we change to the session dir if we preload binaries so we'll use the loaded one
Special patch created for v1.8 and CMR filed

This commit was SVN r31963.
2014-06-06 21:43:23 +00:00
Mike Dubman
55e35e0f6e OSHMEM: Fix issue with incorrect mca variables registration
Few components had wrong mca variables registration procedure
List of them:
- atomic basic and mxm
- spml yoda and ikrit
Two mca variables as runtime_api_verbose and runtime_lock_recursive change
names to oshmem_api_verbose and oshmem_lock_recursive otherwise they
were not shown by oshmem_info tool.

fixed by Igor, reviewed by Miked
cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31962.
2014-06-06 17:36:47 +00:00
Ralph Castain
f197af530c Add print support for more of the opal_value_t values
This commit was SVN r31961.
2014-06-06 17:28:00 +00:00
Ralph Castain
b7c08582ba Add new tag to avoid conflicts
This commit was SVN r31960.
2014-06-06 17:23:35 +00:00
Ralph Castain
638c24f655 Correct the bind-in-place algorithm to better handle comm_spawn. If the location identified by the mapper is already occupied by procs from another job, then we need to shift either right or left until we find an unoccupied location where we can be bound. If nothing is available, then check for the overload flag (and bind us in the original location if provided), or see if this was the default binding policy instead of one specified by the user - if so, then just don't bind this process.
cmr=v1.8.2:reviewer=rhc

This commit was SVN r31959.
2014-06-06 12:36:14 +00:00
Gilles Gouaillardet
90c2f4a10a Fix unpack_ooo test
The test fails on a 32 bits system.
The root cause is a rounding error when testing double numbers.

This commit was SVN r31958.
2014-06-06 07:53:28 +00:00
Gilles Gouaillardet
d26ac02b4a #if OPAL_HAVE_HWLOC protect access to orte_proc_info_t.cpuset
Fix a bug when trunk is configured with --without-hwloc

v1.8 is safe so no cmr

This commit was SVN r31957.
2014-06-06 07:25:39 +00:00
Ralph Castain
e21bfeadcd Now that the BTLs are moving down to OPAL and becoming available to ORTE, there no longer is a need/desire to push performance in the OOB/TCP component. So we don't need multiple modules driving NICs in parallel, and can drop all the complicated distribution logic. Fall back to the simplified single module model, but retain the ability to run that module in its own progress thread if so directed.
This should eliminate the connectivity issues that have been reported, and will make maintenance of this component much easier.

cmr=v1.8.2:reviewer=jsquyres:subject=simplify the OOB/TCP component

This commit was SVN r31956.
2014-06-06 02:24:17 +00:00
George Bosilca
f5ebd2faeb Fix the Fortran issue identified by Akan Sang Loon. The dist graph
is really special as the weights can be one of the following three
values (NULL, EMPTY or some legal value). As such, we need a complex
if to correctly convert the Fortran value to the corresponding C
value. Thus, always defining the c_ array is the simplest and most
straighforward approach.

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31955.
2014-06-05 17:10:48 +00:00
Ralph Castain
ae5eec3e2f Minor tweaks to the pack/unpack routines
This commit was SVN r31954.
2014-06-05 14:49:37 +00:00
Ralph Castain
34cb137314 Add another attribute to the orte_proc_t area
This commit was SVN r31953.
2014-06-05 14:48:19 +00:00
George Bosilca
40d2c75046 Add a slightly modified version of Gilles test for the
irregular packing/unpacking of datatypes.

This commit was SVN r31952.
2014-06-04 18:33:30 +00:00
Ralph Castain
248a4b100f Per Artem, we don't know our VPID at the time of getting the initial timing mark, so just get it if timing is requested
This commit was SVN r31951.
2014-06-04 16:28:41 +00:00
Jeff Squyres
6cf49a3c57 Revert r31934: that MPI_Type_f2c.3 is already in the file list.
This commit was SVN r31950.

The following SVN revision numbers were found above:
  r31934 --> open-mpi/ompi@3ed4aaea99
2014-06-04 13:58:42 +00:00
Ralph Castain
b2413a6b88 Cannot update the proc state prior to activating the state machine as some callback functions need to compare the prior proc state against the new one.
cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31949.
2014-06-04 03:40:08 +00:00
Ralph Castain
b771388fa7 We really need to send *all* the daemon info whenever the daemon job has changed as new daemons need a full nidmap
cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31948.
2014-06-04 03:38:54 +00:00
Ralph Castain
c5384d44d7 Protect against NULL result in get_attr
This commit was SVN r31947.
2014-06-04 03:09:37 +00:00
Ralph Castain
8e768de317 Really should check our own node for oversubscription, not the HNP
cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31946.
2014-06-04 03:09:02 +00:00
Ralph Castain
5398158d5a Move fclose inside bracket to protect against NULL fp
This commit was SVN r31945.
2014-06-04 03:08:18 +00:00
Oscar Vega-Gisbert
c7b229f03e Java - slice methods: set buffer limit to its capacity before change its position
This commit was SVN r31943.
2014-06-03 21:32:27 +00:00
Ralph Castain
f1978fba7c Cleanup a set of typos on the orte_get_attribute call
This commit was SVN r31942.
2014-06-03 20:36:38 +00:00
Jeff Squyres
3ed4aaea99 man pages: Add MPI_Type_f2c.3 file into the tarball
cmr=v1.8.2:reviewer=bosilca

This commit was SVN r31934.
2014-06-03 18:28:50 +00:00
Jeff Squyres
87f9f6815f use-mpi-tkr: fix ierr->ierror param names
Issue noted by Walter Spector on the user's mailing list.

Throwing to Craig Rasmussen for review.

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31933.
2014-06-03 15:26:02 +00:00
Jeff Squyres
82d4e33510 usnic: let connectivity client timeout if agent never appears
This would be a really, really weird case if it ever happens (i.e.,
you have usnics but the agent process failed somewhere in MPI_INIT
such that the agent never appears), but having an infinite loop
doesn't seem like a good idea.

(does not need to go to v1.8 because v1.8 still uses RML for
communication for the connectivity checker)

This commit was SVN r31932.
2014-06-02 18:14:35 +00:00
Jeff Squyres
af04d60098 usnic: do not enable the connecitivty agent if there are no usnics
Reviewed by Dave Goodell.

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31931.
2014-06-02 18:09:55 +00:00
Ralph Castain
5668f085a3 Silence some useless warnings, and fix a missed updated in the tm plm
This commit was SVN r31930.
2014-06-02 17:57:56 +00:00
Ralph Castain
742c0d2284 Fix typo that would cause a segfault if orte_startup_timeout was set
This commit was SVN r31929.
2014-06-02 15:59:18 +00:00
Ralph Castain
7df500ecf5 Break the loop caused by retrying to send a message to a hop that is unknown by the TCP oob component. We attempt to provide a way for other components to try, but need to mark that the TCP component is not able to reach that process so the OOB base will know to give up.
This commit was SVN r31928.
2014-06-02 15:00:33 +00:00
Ralph Castain
03234f2a33 Revert r31926 and replace it with a more complete checking of availability and accessibility of the required freq control paths.
This commit was SVN r31927.

The following SVN revision numbers were found above:
  r31926 --> open-mpi/ompi@9779084352
2014-06-02 14:34:00 +00:00
Gilles Gouaillardet
9779084352 orte_rtc_base_select: skip a RTC module if it has a zero priority
This commit was SVN r31926.
2014-06-02 07:16:41 +00:00