* automatically retrieve the hostname (and all RTE info) for all procs during MPI_Init if nprocs < cutoff
* if nprocs > cutoff, retrieve the hostname (and all RTE info) for a proc upon the first call to modex_recv for that proc. This would provide the hostname for debugging purposes as we only report errors on messages, and so we must have called modex_recv to get the endpoint info
* BTLs are not to call modex_recv until they need the endpoint info for first message - i.e., not during add_procs so we don't call it for every process in the job, but only those with whom we communicate
My understanding is that only some BTLs have been modified to meet that third requirement, but those include the Cray ones where jobs are big enough that launch times were becoming an issue. Other BTLs would hopefully be modified as time went on and interest in using them at scale arose. Meantime, those BTLs would call modex_recv on every proc, and we would therefore be no worse than the prior behavior.
This commit revises the MPI-RTE interface to pass the ompi_proc_t instead of the ompi_process_name_t for the proc so that the hostname can be easily inserted. I have advised the ORNL folks of the change.
cmr=v1.7.4:reviewer=jsquyres:subject=Fix thread deadlock
This commit was SVN r29931.
The following SVN revision numbers were found above:
r29917 --> open-mpi/ompi@1a972e2c9d
includes various fixes all over the C/R code which are
hard to group like the other patches.
Changes from V1:
* explain why mca_base_component_distill_checkpoint_ready no longer works
* compare return result of opal functions with OPAL_* values
Changes from V2:
* use orte_rml_oob_ft_event() instead of referencing through the modules
* properly protect variable (thanks to --enable-picky)
This commit was SVN r29922.
should be already set to the right value. This fixes a problem
identified by Guillaume Gouaillardet, where using a single
persistent receive leads to leaking the convertor stack memory.
Refs trac:3956
cmr=v1.7.4:reviewer=jsquyres:subject=Correctly handle the convertor internal stack for persistent receives.
This commit was SVN r29920.
The following Trac tickets were found above:
Ticket 3956 --> https://svn.open-mpi.org/trac/ompi/ticket/3956
* default to bind-to core
* map-by slot if np=2
* map-by socket (balance across sockets on each node) if np > 2
* map-by <obj> will imply rank-by <obj> by default (leave default binding as above)
Fix a bug in the map-by <obj> mapper where we incorrectly compute the #procs to assign if the #slots > #procs
cmr=v1.7.4:reviewer=jsquyres:subject=Update default binding and mapping values
This commit was SVN r29919.
got linked together (work on one caused work in the other):
* Clean up a bunch of VAR_SCOPE issues in configure. This includes:
* Using VAR_SCOPE_PUSH and VAR_SCOPE_POP in more places
* Cleaning up the use of some shell variables (e.g., name them better)
* Add support for external libevent via
--with-libevent=<dir-to-libevent-install-tree>, as specifically
asked for by downstream packagers.
* Revamp how wrapper compiler RPATH (and RUNPATH) support is done.
The external libevent work exposed weakenesses in how the original
RPATH/RUNPATH work was done, so we had to re-do it to be a bit more
robust.
This work has not yet been tested on Solaris.
Refs trac:3694
This commit was SVN r29899.
The following Trac tickets were found above:
Ticket 3694 --> https://svn.open-mpi.org/trac/ompi/ticket/3694
Noticed these as part of #3694: external libevent's don't cause argv.h
to automatically get included.
Refs trac:3694
This commit was SVN r29897.
The following Trac tickets were found above:
Ticket 3694 --> https://svn.open-mpi.org/trac/ompi/ticket/3694
discovered when removing some components.
This commit was SVN r29895.
The following SVN revision numbers were found above:
r29894 --> open-mpi/ompi@58ed00296c
more:
- Remove OPAL_ENABLE_MULTI_THREADS, since it didn't really do anything
correctly. Opal always has threads enabled at this point.
- Remove OMPI_ENABLE_PROGRESS_THREADS, since this hasn't worked in
8 years and it has performance issues we'll never be able to
overcome. Note that we have plans for re-adding async progress, using
a hybrid protocol of async and sync sends.
- OMPI_ENABLE_THREAD_MULTIPLE now determines whether the thread lock
macros do the check or not.
- Condition variables are ALWAYS polling right now, which fixes the thread
live-lock currently found when THREAD_MULTIPLE is turned on.
This commit was SVN r29891.
cmr=v1.7.4:reviewer=brbarret:subject=Disqualify sm btl for hetero procs
This commit was SVN r29882.
The following Trac tickets were found above:
Ticket 2433 --> https://svn.open-mpi.org/trac/ompi/ticket/2433
For exammple, mca_btl_sm.knem_fd remained 0, and mca_btl_sm_component_close() ended up doing closing fd 0 which belongs to someone else.
fixed by Yossi, reviewed by miked
cmr=v1.7.4:reviewer=ompi-rm1.7
This commit was SVN r29875.
flexible members.
UDCM is ready to go for 1.7.4 with this patch.
cmr=v1.7.4:ticket=3940
This commit was SVN r29861.
The following Trac tickets were found above:
Ticket 3940 --> https://svn.open-mpi.org/trac/ompi/ticket/3940
http://www.open-mpi.org/community/lists/users/2013/10/22882.php, fix
the value of MPI_STATUS_SIZE for the -i8 case. Thanks to Jim Parker
for bringing up the issue and providing the patch.
Separate patches are required for v1.6 and v1.7 (and will be attached
to their respective tickets), because this breaks ABI, so we need a
non-default configure option to fix the issue but knowingly break ABI.
cmr=v1.7.4:reviewer=bosilca:subject=Fix MPI_STATUS_SIZE for -i8 case
cmr=v1.6.6:reviewer=bosilca:subject=Fix MPI_STATUS_SIZE for -i8 case
This commit was SVN r29858.
Note that this event should never happen within a single OMPI job,
because OMPI will ignore usnic ports that are down. The PORT_ACTIVE
event should only occur if a port ''was'' down and is now ''up''. But
what the heck -- if we ever do get this event, it is harmless -- just
ignore it.
This commit was SVN r29852.