handler before the write() and de-register it afterwards. Determine
if the write() succeeded or failed by the return of write().
This commit was SVN r10858.
than $(LN_S). This causes problems with with Windows and probably
elsewhere (re: #200). So use a slightly different trick to get the
right header selected for the MEMCPY and TIMER components.
* Using the same trick used to solve the AC_CONFIG_LINKS problem,
stop using a separate header file for direct calling in the
PML and MTL. This lets me remove some icky code in ompi_mca.m4
that was more fragile than I really liked.
This commit was SVN r10841.
keep the resulting tm_event_t that is generated because the back-end
TM library actually caches a bunch of stuff on it for internal
processing, and doesn't let go of it until tm_poll().
tm_event_t's are similar to (but slightly different than)
MPI_Requests: you can't do a million MPI_Isend()'s on a single
MPI_Request -- a) you need an array of MPI_Request's to fill and b)
you need to keep them around until all the requests have completed.
This commit was SVN r10820.
Jeff: this needs to be back-patched to our supported prior releases. I'll try to verify how far back we need to go - my initial guess is probably all of them
This commit was SVN r10801.
- change -no_oversubscribe to -nooversubscribe (to be similar to
-nolocal)
- Added text to orterun.1 describing slots and -nooversubscribe
Still need to add text about "mpirun a.out" functionality, and RHC
wants to make some minor edits, so committing for synchronization.
This commit was SVN r10800.
Since Jeff and I are going to a branch for T-bird, we have restored the trunk to its prior state to avoid any possibility of disturbing it.
This commit was SVN r10774.
Please report any abnormal behavior during launch, though, as we would like to understand what (if any) impact is seen. I couldn't see any on small jobs (the modulo functions render this number down pretty low).
This commit was SVN r10763.
- orte-clean.c : check to see if the base session directory is empty
and delete it if it is.
- orte_universe_exists.c : Fix a down stread problem resulting from
George's r10718 commit. Don't use the 'fulldirpath' since
that is no longer guarenteed to be the absolute path
to the session directory. Construct this value outside of that
function from the prefix and frontend vars.
This commit was SVN r10741.
The following SVN revision numbers were found above:
r10718 --> open-mpi/ompi@47eef2e002
so that it does not return an error when no universe is passed to it.
Also put back in the 'Slots In Use' column as it is now working properly
per Ralphs recent ras commits. Still not sure what 'Slots Alloc' is meant
to represent, so left that as #if 0'd out for the moment.
This commit was SVN r10739.
The following SVN revision numbers were found above:
r10718 --> open-mpi/ompi@47eef2e002
Update the help text to report errors when not following that rule.
Also updated the RMAPS help text to reflect the reorganization of some of the round-robin code into the base.
The new functionality has been tested under Mac OS-X and on Odin using an MPI program. Both byslot and bynode mapping have been checked and verified. Operational support for other systems needs to be verified - I respectfully request people's help in doing so.
This commit was SVN r10708.
1. Modifies the RAS framework so it correctly stores and retrieves the actual slots in use, not just those that were allocated. Although the RAS node structure had storage for the number of slots in use, it turned out that the base function for storing and retrieving that information ignored what was in the field and simply set it equal to the number of slots allocated. This has now been fixed.
2. Modified the RMAPS framework so it updates the registry with the actual number of slots used by the mapping. Note that daemons are still NOT counted in this process as daemons are NOT mapped at this time. This will be fixed in 2.0, but will not be addressed in 1.x.
3. Added a new MCA parameter "rmaps_base_no_oversubscribe" that tells the system not to oversubscribe nodes even if the underlying environment permits it. The default is to oversubscribe if needed and the underlying environment permits it. I'm sure someone may argue "why would a user do that?", but it turns out that (looking ahead to dynamic resource reservations) sometimes users won't know how many nodes or slots they've been given in advance - this just allows them to say "hey, I'd rather not run if I didn't get enough".
4. Reorganizes the RMAPS framework to more easily support multiple components. A lot of the logic in the round_robin mapper was very valuable to any component - this has been moved to the base so others can take advantage of it.
5. Added a new test program "hello_nodename" - just does "hello_world" but also prints out the name of the node it is on.
6. Made the orte_ras_node_t object a full ORTE data type so it can more easily be copied, packed, etc. This proved helpful for the RMAPS code reorganization and might be of use elsewhere too.
This commit was SVN r10697.
using a pty for everything, which drops all buffered data on the floor when
close() is called on the daemon side, meaning EOF has some issues. Instead,
do the same thing we do for other starters that use the fork() pls -- use
a pipe/fifo for stdin and stderr and a pty for stdout. This is good enough
for what we need and avoids most of the issues with ptys.
This commit was SVN r10692.
Basically, the problem was that the allocator was grabbing everything on the cluster for which the user had access privilege. Thus, if a user had two sessions operable, each with its own allocation, mpirun in each session would grab both sets of nodes and use them. Not very polite.
This commit was SVN r10683.
* num_children should really be an int instead of size_t
since 'size_t' is not signed and num_children can (in rare cases)
drop below 0, and don't want it to roll around to MAX_INT or some
such.
* I figured out that this problem only happened to me because I use
the pls_fork_reap_timeout MCA parameter and thus the only time that
the code in pls_fork_module.c to waitpid is executed is if this is
not set to 0 (I had it set to 1 to give my procs time to exit). I
adjusted the loop from while{...} to do{...}while; so that it is
executed at least once for consistency.
* de-register the SIGCHILD callback for the pid before we attempt
to kill it, so that we don't leave the door open for both the
waitpids (the one in the callback, and the one in this function)
to race to see who can wait on the child.
* Move the 'thread release' to outside the for loop for a bit of an
optimization, and always set the value to 0 since we want to
finish after this function.
* Added a help message for the case when we can't send a kill()
signal to the process. Should never happen, but all is possible
in the wild wild west of HPC.
This commit was SVN r10666.
When we force an application to terminate (via CTRL-C to mpirun)
we send an out-of-band message to the orted to reap its children.
the fork PLS was doing an internal waitpid but never releasing or
updating the information and signaling the condition variable. So
the fork PLS callback for SIGCHLD registered with the event library
and this waitpid are in a bit of a race to 'waitpid' for the children.
Since the PLS callback was the only one that handled the signal properly
when it 'won' then things were great -- as in the normal termination case.
But when it 'lost' -- as in the abnormal termination case -- the orted
never received the proper signal that its children had gone away.
We want to preserve the internal fork PLS callback since it allows
for a timeout while waiting for the child, which the event library
won't do.
This allows both to exist, and behave properly.
This was introduced in r9068.
The ticket is still open since the orted's hang in other situations
still. This is a fix for one of the causes.
This commit was SVN r10662.
The following SVN revision numbers were found above:
r9068 --> open-mpi/ompi@c2c2daa966
/tmp/tm-merge). Validated by RHC. Summary:
- Add --nolocal (and -nolocal) options to orterun
- Make some scalability improvements to the tm pls
This commit was SVN r10651.
with an error status (< 0) then the req buffer is NULL. Put checks around the
OBJ_RELEASE(req) calls so that we don't try to release NULL :/
This commit was SVN r10641.
After seeing the uglyness that is removing directories in the
codebase I decided to push down this to the OPAL by extending the
opal/os_create_dirpath.(c|h) to contain some more functionality.
In this process I renamed 'os_create_dirpath' to 'os_dirpath' since it
is a bit more general now.
Added a few functions to:
- check if an directory is empty
- check to see if the access permissions are set correctly
- destroy the directory at the end of the dirpath
- By using a caller callback function (a la Perl, I believe)
for every file, the caller can have fine grained control over
whether a specific file is deleted or not.
This simplifies things a bit for orte_session_dir_(finalize|cleanup)
as it should no longer contain any of this functionality, but uses
these functions to do the work.
From the external perspective nothing has changed, from the
developer point of view we have some cleaner, more generic code.
This commit was SVN r10640.
Since we don't properly handle connecting/disconecting from multiple
universes, only connect to the first one (or the user specified one).
This is a bug that needs to be fixed, but involves some deep magic in
ORTE.
Print the node segment upon request (-n option).
{{{
Node Name | Arch | Cell ID | State | Slots | Slots Max |
-----------------------------------------------------------
odin001 | | 0 | Unknown | 2 | 4 |
odin002 | | 0 | Unknown | 2 | 5 |
odin003 | | 0 | Unknown | 2 | 6 |
odin004 | | 0 | Unknown | 2 | 7 |
}}}
Since node_slots_alloc and node_slots_inuse are not properly updated
in the GPR don't print those values.
This commit was SVN r10633.
This moves the logic to create the symbolic links for:
- mpirun
- mpiexec
- ompi-ps
- ompi-clean
and their respective man pages to the ompi level from
the orte layer.
This is a bit pedantic, but orte shouldn't be doing the
work of ompi since that is a bit of an abstraction break.
Note: need to autogen.sh to get this. Sorry :(
This commit was SVN r10602.
per a request.
Currently it is not working well. That will soon change
as it just needs a bit of attention and testing to
make it lots-mo-betta.
This commit was SVN r10556.
per a request for its functionality into the main trunk.
This command provides basic information about a running job. It
needs a bit of attention, but works fine in its current iteration.
Please play with it, and lets try to work out all the left over bugs.
Pending action for this tool:
It has been requested that the tool be changed slightly to allow
it to be called via a function call from internal libraries
(e.g. orteconsole).
This commit was SVN r10554.
from the tmp/jjhursey-ft-cr branch.
In this commit we change the way universe names are created.
Before we by default first created "default-universe" then
if there was a conflict we created "default-universe-PID"
where PID is the PID of the HNP.
Now we create "default-universe-PID" all the time (when
a default universe name is used). This makes it much
easier when trying to find a HNP from an outside app
(e.g. orte-ps, orteconsole, ...)
This also adds a "search" function to find all of the
universes on the machine. This is useful in many contexts
when trying to find a persistent daemon or when trying to
connect to a HNP.
This commit also makes orte_universe_t an opal_object_t,
which is something that needed to happen, and only effected
the SDS in one of it's base functions.
I was asked to bring this over to aid in fixing orteconsole
and orteprobe. Due to the change of orte_universe_t to
an object orteprobe may need to be updated to reflect this
change. Since orteprobe needs to be looked at anyway I'll
leave this to Ralph to take care of.
*Note*:
These changes do not depend upon any of the FT work (but
the FT work does depend upon them). These were brought over
to help in fixing some of the ORTE tool set that require
the functionality layed out in this patch.
Testing:
Ran the 'ibm' tests before and after this change, and all was
as well as before the change. If anyone notices additional
irregularities in the system let me know. But none are expected.
This commit was SVN r10550.
SIGUSR2. This can be extended later if needed to include other
signals we should forward to the user processes (TSTP and CONT,
perhaps?)
* Since the signal handlers don't actually run in signal context, we
can use malloc/fprintf/etc. So clean up some of the signal handler
code so that we don't keep message buffers around for the life of
the process
This commit was SVN r10496.
This commit will apply cleanly to the v1.1 branch, and should
be moved over once I get someone to verify it.
The problem is outlined in the bug. The fix was to move the
setting of the app context index (idx) before we put it in the
GPR so that it is propogated to the gpr.
The reason this hasn't bitten us before is because we init
app->idx to 0, which is true most of the time. Except that is
when MPI_Comm_spawn_multiple in which we put in more than
one app context, thus care about correct indexing.
This was causing down the line memory corruption by overrunning
the mapping array. This commit also puts in a check to make
sure that we error out if we ever try to do that again.
This commit was SVN r10380.
mpirun/orterun now has an option to print the version number. If -V/--version
is given, it will print the version number. If it's the only option, we
exit cleanly. Otherwise, we continue on as if --version wasn't given
(except we've printed the version number).
--This line, and th se below, will be ignored--
M orte/tools/orterun/orterun.c
M orte/tools/orterun/help-orterun.txt
This commit was SVN r10276.
1. Changed the RMGR and PLS APIs to add "signal_job" and "signal_proc" entry points. Only the "signal_job" entries are implemented - none of the components have implementations for "signal_proc" at this time. Thus, you can signal all of the procs in a job, but cannot currently signal only one specific proc.
2. Implemented those new API functions in all components except xgrid (Brian will do so very soon). Only the rsh/ssh and fork modules have been tested, however, and only under OS-X.
3. Added signal traps and callback functions for SIGUSR1/2 to orterun/mpirun that catch those signals and call the appropriate commands to propagate them out to all processes in the job.
4. Added a new test directory under the orte branch to (eventually) hold unit and system level tests for just the run-time. Since our test branch of the repository is under restricted access, people working on the RTE were continually developing their own system-level tests - thus making it hard to help diagnose problems. I have moved the more commonly-used functions here, and added one specifically for testing the SIGUSR1/2 functionality.
I will be contacting people directly to seek help with testing the changes on more environments. Other than compile issues, you should see absolutely no change in behavior on any of your systems - this additional functionality is transparent to anyone who does not issue a SIGUSR1/2 to mpirun.
Ralph
This commit was SVN r10258.
of $libdir and $bindir (i.e., was correctly doing local launches, but
was still using $prefix/lib and $prefix/bin for remote launches).
[Re-]Fixes OFED bug 59.
This commit was SVN r10207.
The following SVN revision numbers were found above:
r9930 --> open-mpi/ompi@1d6902296c
We were stuck in an infinite loop inside the rmaps round_robin
component when the user specified a host, then over subscribed it.
Instead of retuning an error, we looped forever.
For example:
$ cat hostfile
A slots=2 max-slots=2
B slots=2 max-slots=2
$ mpirun -np 3 --hostfile hostfile --host B
<hang>
The loop would not terminate because both host A and B are in the
'nodes' structure as they are both allocated to the job. However,
after allocating 2 slots to host B, we remove it from the node list
leaving us with a 'nodes' structure with just A in it. Since we can't
use host A, we keep looping here until we find a node that we can use.
This patch checks to make sure that if we get into this situation where
rmaps is looping over the list a second time without finding a node
during the first pass then we know that there are no nodes left to
use, so we have a resource allocation error, and should return to the user.
This patch should be moved to all of the release branches
This commit was SVN r10131.
pipeline. See lengthy comment in iof_base_endpoint.c for the details, but
the short version is that we shouldn't set O_NONBLOCK on standard I/O
file descriptors, so we no longer do.
Closes ticket:9
This commit was SVN r9966.
option as discussed on the devel-core mailing list. The Big
Difference is that instead of hard-coding the strings "/lib" and
"/bin" in to append to the prefix, we append the basename of the local
libdir and bindir. Hence, if your libdir is $prefix/lib64, we'll
append /lib64 to construct the remote node's LD_LIBRARY_PATH (etc.).
Also appended the orterun.1 man page to include a description of
--prefix, how it is constructed, what it handles / what it does not,
etc.
This commit was SVN r9930.
exit out, rather than trying to have the pls exit. Since singletons
weren't started with a pls, there's no way the pls is going to be
able to kill the process. So just exit and save the error message.
This commit was SVN r9859.
in San Jose. Allow the configure option --disable-binaries to build OMPI,
but not build or install the support binaries (so basically, just build
the libraries).
This commit was SVN r9777.
as the smaller machine's SIZE_MAX won't be SIZE_MAX on the bigger machine, which
can lead to failures along the way -- in this case, with GPR triggers being
improperly fired.
This commit was SVN r9776.
is interpreted as a shutdown of the io channel on the next iteration.
Definitively not the good approach. The correct condition is
bigger than 0.
This commit was SVN r9770.
copy of the receive buffer based on the iovec struct that may have been updated
during partial reads to reflect the current offset. Need to make the copy using
the base address of the buffer.
Thanks to Sven Stork for finding this.
This should be backported to 1.0.X and 1.1.X branches.
This commit was SVN r9749.
automagically bring in the libraries through the top-level library (so
liborte automatically brings in libopal, etc.). Otherwise, we get some
warnings on Solaris
This should go to the v1.1 branch
This commit was SVN r9666.
- The constant 1 is a signed int by default. Explicitly say that
it is an unsigned value so we can't overflow
- Fix unreachable statement warnings in dss_arith by breaking out
of switch statements instead of returning - this should have
no impact on performance, since it's a non-conditional jump
- A couple of the GPR files had carriage returns and were in
DOS mode - put them in unix mode...
These should all probably go to the v1.1 branch...
This commit was SVN r9664.
Copy the Doxyfile to orte so people can generate just that documentation. Adjust the properties on that directory so it ignores the resulting doxygen output tree.
This should go over to the release branch(es)
This commit was SVN r9625.
thread, which will do progress independently of MPI. So in this case we
have to call opal_event_loop instead of opal_progress.
This commit was SVN r9551.
event library (since the event library has its own thread). So when
we are using progress threads, we really want to call opal_event_loop()
and not opal_progres().
This commit was SVN r9549.
* Add a platform spec for using the portals reference implementation's
RTE instead of our own to make local testing easier.
* Add a cnos rmgr component so that 1) we don't have to build nearly
as many components (no need for ras,rds,pls,etc.) and 2) calls
to MPI_ABORT() won't print error messages about not being able to
contact the daemon. Still need to fill in some of the terminate
stuff with calls from cnos, but will come in time.
* Make gpr_null use the base code for creating value and keyval
structures so that we don't segfault in ompi_mpi_init().
This commit was SVN r9510.
to the defines. As our internal types (job_id and co.) are unsigned that generate
several errors (integer overflow in expression and comparison between signed and
unsigned). Casting the defines to the correct type solve these problems.
This commit was SVN r9481.
for the type to be packed / unpacked when dealing with sized types (like
size_t) so that the dss_unpack code to deal with types of different sizes is
activated. Necessary for proper 32/64 interoperability.
This commit was SVN r9475.
* cleanup the unpack_size_mismatch macros a little bit
* ad comment about endianness of size_mismatch cleanup code so that I don't
think I've found a bug that really isn't and lose an hour tracking it down
again...
This commit was SVN r9458.
NOTE: JEFF SHOULD CHECK THIS!
I found that orterun was not tracking the index number of the app_contexts it was creating. Hence, the app_context->idx field was always sitting at zero. This index is used by the mapper to decide which app_context to use for each process - thus, with the value of each index being zero, the mapper only used the first app_context that was created. All others were ignored.
Not sure when this might have gotten changed. Could be it was a problem that always existed, but didn't get exposed until something else was changed.
Anyway, it seems to work now - could stand further testing.
This commit was SVN r9389.
* Don't do the .in -> .tmp -> header thing for the prefixes and versions.
It causes some severe cleanup issues all to save 4 files from rebuilding
when configure is run.
* Clean up some makefiles so it's clear what is being installed/disted
This commit was SVN r9260.
SLURM. Currently, this includes AIX and Linux. If the user wants to build
SLURM on another platform, they can specify --with-slurm.
* Enable/disable the SLURM sds component using the same logic as the PLS and
RDS components.
This commit was SVN r9259.
installation directories) in configure, the files that depend on this
information are not properly rebuilt. If you need this information,
don't setup a -D in the Makefile.am - instead, include
opal/install_dirs.h.
* Use the : option in AC_CONFIG_FILES to avoid needing to expose that
we are playing around with temporary files with our headers to avoid
rebuilding
* Clean up the version file information a bit, and like the install
directory stuff, make sure that there is a dependency so that
ompi_info gets rebuilt properly when a version number changes.
This commit was SVN r9256.
(reported by Cisco)
- While in orterun, add a feature that multiple users have asked for:
if you specify an absolute pathname to orterun, such as
"/path/to/bin/orterun ...", it's equivalent to "orterun --path
/path/to ..."
This commit was SVN r9181.
- moved hton64 and ntoh64 from the bunch of places it had been copied
into one header file
- properly set and use the btl_tcp's nbo option to put things in
network byte order on the wire if both sides don't have the same
endianness
- Put the OB1 PML's headers (with a couple exceptions I need to discuss
with Tim) in network byte order on the wire if both sides don't have
the same endianness
- since it was needed for the TCP BTL, move the orte_process_name_t
HTON and NTOH macros from the TCP OOB to ns_types.h
This commit was SVN r9145.
to match the receiving process's setup. sizeof(bool) is different on
i386 OS X and PowerPC OS X, so need this to do endian testing between
the two
This commit was SVN r9140.
- Explained <program> and made a consistancy change in the Quick Start section.
- Change references to 'app schema' to Open MPI 'app context'
- Audit the command line arguments for --foo, -foo stuff.
This commit was SVN r9097.
and cwd update functionality. For bproc, we *do* need to change
directories while checking the cwd because argv[0] may be expressed as
a relative path, and therefore needs to be checked from the cwd
expressed in the app context.
This commit was SVN r9084.
This needs to be gone over a few more times before it is allowed to see
daylight, but has come a long way. Some sections may be off more than a little,
but the general idea is there.
Need to audit to make sure we don't call the ORTE VHNP's daemons :)
This commit was SVN r9078.
Finished adding and pruning all the the Options.
Cleaned up a bunch of man syntax, so it should be 'more' readable (making the
assumption that man source is ever readable :p).
I am moving on to the "description" and "see also" sections next.
This commit was SVN r9077.
argv[0] and the cwd on the target node (i.e., the node where the
executable will be running in all systems except BProc, where the
searches are run on the node where orterun is invoked).
- fork pls now does cwd and argv[0] search in orted
- bproc pls does cwd and argv[0] search in orterun
- cwd behavior slightly different:
- if user specifies a -wdir to orterun, we chdir() to there; if we
can't for some reason, abort
- if user does not specify a -wdir, try to chdir() to the dir where
orterun was invoked. If we can't for some reason (e.g., it
doesn't exist on the target node), then try to chdir($HOME). If
we can't do that, then just live with whatever default directory
we were put in.
This commit was SVN r9068.
really needs to be documented (in part so that users stop asking us
how to do it!).
This is a first cut at an orterun.1 man page. It is 95% copied from
LAM's mpirun.1 lam page -- I just edited the very top and am handing
this off to Josh to finish the first cut. Then we'll add specific
docs about the behavior of some of the finer details. This is not
listed in the Makefile.am yet because it's so incomplete/incorrect
(w.r.t. OMPI), so I don't want it included in the tarball or installed
[yet].
This commit was SVN r9058.
- move files out of toplevel include/ and etc/, moving it into the
sub-projects
- rather than including config headers with <project>/include,
have them as <project>
- require all headers to be included with a project prefix, with
the exception of the config headers ({opal,orte,ompi}_config.h
mpi.h, and mpif.h)
This commit was SVN r8985.
The INIT counter is supposed to be adjusted when the processes are mapped - this is now done correctly.
The LAUNCHED counter is supposed to be adjusted when the pls sets the process pid info into the registry and changes the state to LAUNCHED. This could probably be changed to have that function use the set_proc_soh API, but this fixes the problem for now.
Thanks to Brian for finding that the triggers were not being fired.
This commit was SVN r8948.
cleanup code in the signal part of the event library
* Only attempt to forward standard input if we have a controlling terminal
(isatty() returns 1) and we are the foreground process OR we do not have
a controlling terminal (isatty() returns 0). If we have a controlling
terminal, check at each SIGCONT if we should change our forwarding,
since our foreground / background status may have changed.
Unfortunately, there isn't a great way in the iof framework to know if
we are capturing a starter's stdin. Use the logic that if it's a source
AND tagged as standard input, it's a starter's stdin. This seems to
work for all the common usages.
Both these need to go to the v1.0 branch.
This commit was SVN r8894.
the svc component so that it can disable the rml exception callback, fixing
a race condition in the shutdown mechanism of orte.
This should probably go to the v1.0 branch.
This commit was SVN r8893.
discards all of the data in the pty that hasn't been read. This was
leading to data being discarded when files were redirected into
mpirun and read by rank 0 of the job. This was very "not good".
The decision to not use ptys for stdin was made based on what Tim said
that LA-MPI was doing.
This needs to go to the v1.0 branch... Tim should probably review...
This commit was SVN r8892.
r8698), with changes below:
- Split wrapper flags into those required for each of the three projects,
and cleaned up some cruft (including the LIBMPI_EXTRA_*FLAGS) through-
out the build system
- Added opal_init_util and opal_finalize_util to allow init / cleanup
of all the opal code that doesn't require the MCA system
- Create standalone key=value file parser, based on the one that used
to be in the mca param parser, so that it can be shared in multiple
places
- Add wrapper datafiles for opal, orte, and ompi wrappers, and add
wrapper compiler with support for all the old features
This commit was SVN r8699.
The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
r8690
r8698
intended to include the OMPI_DEBUG_ZERO call).
These debugging statements should not have affected correcteness
because the value of 78 will be overridden in the read() and the
assert()/abort() stuff will only be triggered on an error which should
never happen (i.e., the error should have been handled by the prior if
conditional). But still, thise code should not be there.
This commit was SVN r8649.
The following SVN revision numbers were found above:
r8643 --> open-mpi/ompi@a6b869ed68
(windows). Instead use the LN_S variable exported by the Makefile (set to
"ln -s" on all Unixes and to "cp -p" on windows).
When we remove an executable use the correct extension for its name
(add $(EXEEXT) to the name).
This commit was SVN r8616.
and path->agent_path so that it's totally clear what these are for
- make a new rsh component param for agent_param (the value from the
MCA param)
- delay the path check for the agent until the component init -- don't
make it fail during open, because the MCA base will print a warning
if a component fails open() (e.g., on clusters without rsh/ssh (!),
this component was failing noisly even though it was
normal/expected)
This commit was SVN r8596.
In case of checking for Shell with --mca pls_rsh_assume_same_shell 0
have the node point to sensible values.
This commit was SVN r8563.
The following SVN revision numbers were found above:
r7664 --> open-mpi/ompi@0629cdc2d7
now take a colon-delimited list of agents (and associated argv). Also
change the default value to "ssh : rsh". Hence, if we run on a
cluster that does not have ssh, we'll fall back to rsh. If we can't
find rsh, then the rsh component will disqualify itself from
selection.
This commit was SVN r8514.
- Need to make sure that SIZE_MAX exists as a constant if stdint.h
doesn't exist
- struct timeval is defined in unistd.h on IRIX, so need to include
that headerfile where ever struct timeval is used.
This commit was SVN r8361.
- when eof is reached at orterun, send a 0 byte message to peer indicating eof
- on receipt of zero byte message - close corresponding file descriptor associated with the endpoint
- require setup ptys for stdin and stdout so that stdin can be closed independently of stdout
This commit was SVN r8264.
debugger scheme described in
http://www.open-mpi.org/community/lists/users/2005/10/0214.php. This
makes our user-level debugger scheme much more vendor-independent
(although the "-tv" option will still work for backwards compatibility
-- it'll just be a synonum of "--debug").
This commit was SVN r8206.
* turns out (duh!) that there was a reason that the <projectdir>dir
variable was set in the AM conditional. If not, stupid directories
are created and not needed... duh.
This commit was SVN r8205.
component/base Makefile.am files, reducing the time configure spends
stamping out Makefiles at the end
* Install base_impl.h file when devel-headers are being installed
This commit was SVN r8200.
When compiling C++ code that includes something that looks for the C++
header file "memory" (stupid C++ headers not having .h extensions), it
goes through the header file search path, which includes $(topsrcdir)/opal,
so it finds the directory $(topsrcdir)/opal/memory/ and tries to load
that as the memory header file and all goes downhill.
This commit was SVN r8111.
spining in orte_iof_base_flush() when running
intel_tests/src/MPI_Errhandler_fatal_c
When we close an endpoint by taking it out of the envent handler, we need to make
sure that it fits the criteria to pass through orte_iof_base_flush(), specificly
make sure we clean out the ep_frags list.
Note: This is more of a sanity check, since the endpoint should already be
in this state at the point of closure.
Secondly in orte_iof_base_endpoint_read_handler(), if we determine that it is
necessary to close the endpoint we have to "return" after doing so, otherwise
we add another frag to the endpoint which will cause it to hang in
orte_iof_base_flush().
Bug go squish!
This commit was SVN r8109.