The alps ras and plm components were broken by recent changes in ORTE. This
commit resolves those issues.
Changes:
- Define PMI2_SUCCESS if it isn't defined. This fixes a problem with Cray's
PMI implementation which does not define (for some reason) PMI2_SUCCESS. We
had previously just used PMI_SUCCESS.
- Add missing definition and a typo in pml_alps_module.
- launch_id is no longer available in the orte_node_t structure. Use the
attribute lookup to get the value.
- Do not use an O(n^2) sorting algorithm when putting alps nodes in order. Use
opal_list_sort instead (O(nlogn)).
This commit was SVN r32076.
This won't transition cleanly to the 1.8 series, and may represent too much change, so we'll have to (a) evaluate whether or not to bring it over (once it demonstrates that it does indeed solve the problem), and (b) develop a custom patch for that purpose.
Refs trac:4717
This commit was SVN r32063.
The following Trac tickets were found above:
Ticket 4717 --> https://svn.open-mpi.org/trac/ompi/ticket/4717
(a) default binding policy is in effect. In this case, we will emit a
warning and default to not binding unless the user provided the
"oversubscribe" or "overload" modifier to the "bind-to" option.
(b) user-specified binding policy is in effect. In this case, we will
error out unless the user provided the "oversubscribe" or "overload"
modifier to the "bind-to" option as we cannot meet the directive.
Either "bind-to" modifier (oversubscribe or overload) will be accepted for
now - in 1.9, we will deprecate the "overload" term in favor of
"oversubscribe".
Also added the ability to accept a --bind-to modifier without specifying the binding policy itself so a user can specify overload-allowed with the default policy.
Closes trac:4345
cmr=v1.8.2:reviewer=rhc:subject=resolve handling of overload conditions
This commit was SVN r32005.
The following Trac tickets were found above:
Ticket 4345 --> https://svn.open-mpi.org/trac/ompi/ticket/4345
* allow users to specify just a modifier for map-by instead of requiring that they also specify a policy. Thus, we now accept --map-by :pe=3 as indicating that we should use the default mapping policy, but bind 3 cpus/proc.
* if users specify a pe's/proc but no policy, default to --map-by NUMA to ensure we have access to multiple cpus for the request. This won't guarantee we have access to enough to meet the request, but gives us a chance. In addition, we know that binding a proc to multiple cpus will work best if those cpus are all in the same NUMA, so this provides some degree of optimized behavior.
Per a request from Jeff, define "oversubscribe" for binding as a synonym for the "overload" modifier.
cmr=v1.8.2:reviewer=rhc
This commit was SVN r31967.
This should eliminate the connectivity issues that have been reported, and will make maintenance of this component much easier.
cmr=v1.8.2:reviewer=jsquyres:subject=simplify the OOB/TCP component
This commit was SVN r31956.
http://www.open-mpi.org/community/lists/devel/2014/05/14822.php
Revamp the ORTE global data structures to reduce memory footprint and add new features. Add ability to control/set cpu frequency, though this can only be done if the sys admin has setup the system to support it (or you run as root).
This commit was SVN r31916.
This commit is a slightly better workaround to prevent mesages of
the form:
[unset]:_pmi_alps_get_apid:alps_app_lli_put_request failed
[unset]:_pmi_alps_get_appLayout:pmi_alps_get_apid returned with error: Bad file descriptor
It works by completely disabling PMI in the application process when using
mpirun. This should not be an issue for any apps.
cmr=v1.8.2:reviewer=rhc
This commit was SVN r31882.
grpcomm: fix memory leaks
We were leaking the caddy object used to pass data to the callback
function. This commit fixes these leaks.
oob,rml: fix memory leaks
This commit fixes several leaks:
- Both the oob/base and oob/tcp were leaking objects on their peer
hash tables. Iterate on the hash tables and free any objects.
- Leaked sent messages because of missing OBJ_RELEASE. I placed the
release in ORTE_RML_SEND_COMPLETE to catch all the possible
paths.
ess/base: close the state framework
cmr=v1.8.2:reviewer=rhc
This commit was SVN r31776.