- Delete unnecessary header files using
contrib/check_unnecessary_headers.sh after applying
patches, that include headers, being "lost" due to
inclusion in one of the now deleted headers...
In total 817 files are touched.
In ompi/mpi/c/ header files are moved up into the actual c-file,
where necessary (these are the only additional #include),
otherwise it is only deletions of #include (apart from the above
additions required due to notifier...)
- To get different MCAs (OpenIB, TM, ALPS), an earlier version was
successfully compiled (yesterday) on:
Linux locally using intel-11, gcc-4.3.2 and gcc-SVN + warnings enabled
Smoky cluster (x86-64 running Linux) using PGI-8.0.2 + warnings enabled
Lens cluster (x86-64 running Linux) using Pathscale-3.2 + warnings enabled
This commit was SVN r21096.
several header files (previously included by header-files)
now have to be moved "upward".
This is mainly system headers such as string.h, stdio.h and for
networking, but also some orte headers.
This commit was SVN r21095.
In case we use memcmp, strlen, strup and friends include <string.h>
Also several constants.h are not included directly
- Let's have mca_topo_base_cart_create return ompi-errors in
ompi/mca/topo/base/topo_base_cart_create.c
This commit was SVN r20773.
Adapt orte_process_info to orte_proc_info, and
change orte_proc_info() to orte_proc_info_init().
- Compiled on linux-x86-64
- Discussed with Ralph
This commit was SVN r20739.
Only proc_info.h-internal include file is opal/dss/dss_types.h
- In one case (orte/util/hnp_contact.c) had to add proc_info.h again.
- Local compilation (Linux/x86_64) w/ -Wimplicit-function-declaration
works fine, no errors.
Again, let's have MTT the last word.
This commit was SVN r20631.
Often, orte/util/show_help.h is included, although no functionality
is required -- instead, most often opal_output.h, or
orte/mca/rml/rml_types.h
Please see orte_show_help_replacement.sh commited next.
- Local compilation (Linux/x86_64) w/ -Wimplicit-function-declaration
actually showed two *missing* #include "orte/util/show_help.h"
in orte/mca/odls/base/odls_base_default_fns.c and
in orte/tools/orte-top/orte-top.c
Manually added these.
Let's have MTT the last word.
This commit was SVN r20557.
Correct the orte-show-help file when a rank is out of bounds, and do that test where a wildcard doesn't get incorrectly flagged as out-of-bounds.
This commit was SVN r20398.
1. fix a race condition whereby a proc's output could trigger an event prior to the other outputs being setup, thus c ausing the IOF to declare the proc "terminated" too early. This was really rare, but could happen.
2. add a new "timestamp-output" option that timestamp's each line of output
3. add a new "output-filename" option that redirects each proc's output to a separate rank-named file.
4. add a new "xterm" option that redirects the output of the specified ranks to a separate xterm window.
This commit was SVN r20392.
SIGCONT to the a.outs. By default, they are not forwarded and
the behavior remains as it has always been. However, if one
runs with --mca orte_forward_job_control 1, then mpirun will
catch those two signals and forward them to the orteds which
will deliver them to the a.outs. We have had requests for
this feature.
This commit was SVN r20391.
If the --wdir option is given, check to see if the user provided a relative path. If so, convert it to an absolute path. This is needed to maintain consistent behavior across environements. Some environments automatically chdir to your current working directory when launching the remote orted, while others (e.g., ssh) don't. This levels the playing field and reduces user surprise.
This commit was SVN r20342.
Reverted r20306 since the fix caused 100% failues on our !BigRed system.
See the comments on ticket #1763 for the details.
This commit was SVN r20339.
The following SVN revision numbers were found above:
r20306 --> open-mpi/ompi@8c87e48721
The following Trac tickets were found above:
Ticket 1763 --> https://svn.open-mpi.org/trac/ompi/ticket/1763
* Improved the error propagation from a backend orted
* Fixed a hang in orterun due to failed files transferred
* Fix the movement of files with relative path names
* Improved error messages when a file cannot be moved
* Move file checks to FileM instead of embedding then in the ODLS
This commit Refs trac:1770
This commit was SVN r20331.
The following Trac tickets were found above:
Ticket 1770 --> https://svn.open-mpi.org/trac/ompi/ticket/1770
Also, per chat with Jeff, modified the Makefile.am's of a few orte tools so that they were consistent in the way we generate the ompi-equivalent cmds.
This commit was SVN r20165.
Basically, the remaining problem turned out to be:
1. closing stdout/stderr during orte_finalize of mpirun
2. inadvertently setting up a write event on fd = -1
3. devising a scheme to more accurately track when the stdin write event was active vs closed so it only got released once
This passed prelim MTT testing by Jeff and Tim, but should soak for awhile before migrating to 1.3.
This commit was SVN r20106.
The following SVN revision numbers were found above:
r20064 --> open-mpi/ompi@a07660aea8
r20068 --> open-mpi/ompi@ec930d14a9
r20074 --> open-mpi/ompi@2940309613
It is a small launch performance improvement as now we relay the launch cmd across to the next daemon before taking the time to launch our own local procs. Still, it does allow more parallel operations during the launch procedure.
This commit was SVN r20104.
1. a direct callback from waitpid - this set the waitpid_fired flag
2. a notify event callback from the IOF - this set the iof complete flag
3. a message via the daemon cmd processor from the proc "de-registering" the sync, thus indicating it was going through MPI_Finalize.
The problem is that these could overlap, with the first two allowing the orted to declare the proc complete before the daemon had responded to #3.
This change forces all three events to flow through the daemon cmd processor, thus ensuring an ordered handling. I'm not certain this will solve the problem, but will await further MTT reports to see. Unfortunately, the problem doesn't show up on any manual or script-based tests I have been able to run, even when I duplicate the exact cmd that fails under MTT.
This commit was SVN r20074.
1. coordination of job completion notification to include a requirement for both waitpid detection AND notification that all iof pipes have been closed by the app
2. change of all IOF read and write events to be non-persistent so they can properly be shutdown and restarted only when required
3. addition of a delay (currently set to 10ms) before restarting the stdin read event. This was required to ensure that the stdout, stderr, and stddiag read events had an opportunity to be serviced in scenarios where large files are attached to stdin.
This commit was SVN r20064.
This commit does that by ensuring that daemons retain knowledge of proc location for all procs in their job family. It required a minor change to the ESS API to allow the daemons to update their pidmaps as data was received. In addition, the routed modules have been updated to take advantage of the newly available info, and the encode/decode pidmap utilities have been updated to communicate the required info in the launch message.
This commit was SVN r20022.