1
1
Граф коммитов

15676 Коммитов

Автор SHA1 Сообщение Дата
Edgar Gabriel
ad9f793ce4 avoid calling omp_dpm.mark_dyncomm if the size of the local communicator
is zero. The routine assumes that at least one process is available in the
group, which lead to a segfault when creating communicators with GROUP_EMPTY.

Fixes trac:2752

This commit was SVN r24595.

The following Trac tickets were found above:
  Ticket 2752 --> https://svn.open-mpi.org/trac/ompi/ticket/2752
2011-03-31 19:57:06 +00:00
Terry Dontje
266e663091 Add opal_tree class. This will be used in the future by sysinfo to store hw maps to be used by rmaps for the new affinity code.
This commit was SVN r24594.
2011-03-30 08:05:28 +00:00
Ralph Castain
30fb002524 Take the first small step towards rationalizing rsh support. Create a new "rshbase" component that contains a simple rsh module - no tree spawn, uses all the base functions for launch support. Extend the base rsh support functions to include those functions in common across all rsh modules.
Only a minor change made to the current rsh module to avoid a naming conflict. Otherwise, left it alone to avoid creating conflicts with other external work. The current rsh module remains the default for rsh/ssh support, and continues to contain the support for SGE and Loadleveler.

This commit was SVN r24593.
2011-03-30 01:15:07 +00:00
Nysal Jan
866ae8b43a Close the file descriptor
This commit was SVN r24580.
2011-03-29 08:42:49 +00:00
Nysal Jan
c8c6b0edab Improve LoadLeveler integration with Open MPI. Add support for LL native rsh agent - llspawn
This commit was SVN r24579.
2011-03-29 07:46:59 +00:00
Ralph Castain
f40edd6b4f Add the stupid test word
This commit was SVN r24578.
2011-03-26 03:38:59 +00:00
Nathan Hjelm
8634b6394f fixed plm/tm component
This commit was SVN r24577.
2011-03-25 22:20:15 +00:00
Ralph Castain
5bfb01c6c8 Only build the linux component of sysinfo if linux is the operating system.
Thanks to Paul Hargrove for the suggestion.

This commit was SVN r24576.
2011-03-25 20:55:57 +00:00
Matthias Jurenz
53346a9c1a - fixed handling NULL value of pathname given to certain I/O calls (e.g. fopen, open, unlink)
- incremented version number

This commit was SVN r24575.
2011-03-25 11:15:49 +00:00
Jeff Squyres
58a13f87e6 Oops -- forgot to add opal_config_top.h to Makefile.am (so that it'll
be included in the tarball).

This commit was SVN r24572.
2011-03-25 01:21:11 +00:00
Jeff Squyres
5ae1b15b6e Ensure that other packages defining PACKAGE_ macros don't hurt us, and protect others from our PACKAGE_ macros.
This commit was SVN r24571.
2011-03-24 22:39:56 +00:00
Ralph Castain
d7e029cb40 Convert heartbeat to multicast basis
This commit was SVN r24570.
2011-03-24 19:05:39 +00:00
Jeff Squyres
cf6c5e8d48 Fix a bug noted by Gus Correa on the user's list: mpi_paffinity_alone
appeared multiple times in ompi_info output (so did others, but this
is the one that was noticed).  Ensure that we don't repeat
opal_paffinity_base_register_params() multiple times.

This commit was SVN r24569.
2011-03-24 00:58:25 +00:00
Ralph Castain
90698a2c02 Ensure that blocking recvs wait until the data is actually recvd
This commit was SVN r24558.
2011-03-22 18:45:54 +00:00
Ralph Castain
888472f671 Do not release recv as the calling function needs that data and will release it later
This commit was SVN r24557.
2011-03-22 18:44:56 +00:00
Ralph Castain
a3b0a9fcb7 Update platform file
This commit was SVN r24556.
2011-03-22 18:28:12 +00:00
Ralph Castain
30981de200 Minor cleanups courtesy of Nysal - thanks!
This commit was SVN r24552.
2011-03-22 13:48:58 +00:00
Josh Hursey
045035963a Fix return code from MPI_Probe and MPI_Iprobe.
Instead of returning MPI_SUCCESS every time they are called regardless of the status of the call, they should return a value representative of the action. So similar to MPI_Wait/MPI_Test they will return MPI_SUCCESS if the action was successfull, or the value that matches status.MPI_ERROR for the operation if it is unsuccessful.

This was discussed on the [http://www.open-mpi.org/community/lists/devel/2011/03/9109.php ompi-devel list]

This commit was SVN r24551.
2011-03-22 13:29:29 +00:00
Ralph Castain
c1396b278c Resolve the rsh confusion by splitting the initial search for a launch agent from the actual setup of the launch agent values in the plm base globals. Have each aspiring rsh-clone call lookup to see if their desired launch agent is available - if not, then reject that plm component.
If so, then setup the actual launch agent values only when the module init function is called.

This resolves the current conflict between the rsh and rshd components. Hopefully, it may avoid future problems in this area -provided- any new uses of rsh-like launchers abide by the lookup-and-then-setup rule.

This commit was SVN r24550.
2011-03-22 02:23:09 +00:00
Ralph Castain
d17b50e1ff Add the appropriate hooks to tell Totalview to display the user's main program upon startup. Apparently, this hook got lost somewhere after the 1.2 series :-(
Thanks to David Turner and the TV folks for passing this along.

This commit was SVN r24549.
2011-03-21 17:40:58 +00:00
Ralph Castain
795ca2cff2 Complete implementation of the multicast-based grpcomm module
This commit was SVN r24548.
2011-03-20 01:18:06 +00:00
Ralph Castain
fa40f5d7c3 Fix bad formatting
This commit was SVN r24547.
2011-03-20 01:17:29 +00:00
Ralph Castain
281116ddc5 A max_restarts value of -1 is now valid and indicates infinite restarts, so correct the validity check
This commit was SVN r24546.
2011-03-20 01:17:00 +00:00
Eugene Loh
2770a12beb Continue clean up of thread options started in r22841, 22842, and 22849.
No need for any CMRs to 1.5... that was already done in CMR 2728.

This commit was SVN r24545.

The following SVN revision numbers were found above:
  r22841 --> open-mpi/ompi@b400b84162
2011-03-18 21:36:35 +00:00
Matthias Jurenz
c34eed80c6 Fixed typo in configure options
This commit was SVN r24544.
2011-03-18 14:42:49 +00:00
Jeff Squyres
82f9474fec Revert r24533 and r24507 until the compile errors can be fixed.
This commit was SVN r24541.

The following SVN revision numbers were found above:
  r24507 --> open-mpi/ompi@4ce1936fed
  r24533 --> open-mpi/ompi@3204af2d36
2011-03-18 13:33:02 +00:00
Jeff Squyres
733fa92ab8 Improve the search/replace scripty foo a bit: don't traverse into .hg
and .git directories.

This commit was SVN r24540.
2011-03-18 12:41:46 +00:00
Jeff Squyres
dd2e57f41c Make the HTML man page script a little more robust
This commit was SVN r24539.
2011-03-17 15:09:23 +00:00
Shiqing Fan
aac0db05bb Add support for Intel Fortran compiler 12 on Windows.
This commit was SVN r24538.
2011-03-17 12:08:13 +00:00
Jeff Squyres
bffa5c8f7e * Rename OMPI_CHECK_PTHREAD_PIDS to OPAL_CHECK_PTHREAD_PIDS.
* Convert from AC_TRY_RUN to AC_RUN_IFELSE.
 * Excellent suggestion from Paul Hargrove: use AC_CHECK_FUNC to look
   for a Linuxthreads-specific symbol when we're cross compiling to
   see if threads will have different PIDs (because AC_CHECK_FUNC
   works properly even when in cross-compiling environments).

Background: the old/Linuxthreads-based pthreads implementation used
the Linux clone() call to make threads, which effectively meant that
each thread had a different PID.  The new NPTL pthreads implementation
does things better, meaning that threads have the same PID.  

Open MPI no longer supports threads with different PIDs -- we ripped
out the supporting code for threads with different PIDs because we
don't have systems available to test this on anymore (anyone who still
has such a system can still use older versions of Open MPI).  Hence,
configure needs to determine whether the target system will have the
same PID for threads or not -- even if we're cross-compiling.  The
current test compiles and runs a multi-threaded app that checks PIDs
of different threads, but we clearly can't do that in a
cross-compiling environment.  So use AC_CHECK_FUNC in cross-compiling
environments.

Simple, no?

This commit was SVN r24537.
2011-03-17 11:59:54 +00:00
Ralph Castain
ee68cd102c Fix the hier grpcomm module so modex results in correct data. The prior implementation stored the modex data as node-based attributes. This worked fine for BTL's such as openib where the interfaces were associated with the node. However, BTL's such as TCP have interfaces associated with a specific process, not a node. Thus, store the data in the modex database so it is correctly indexed.
This commit was SVN r24536.
2011-03-17 02:22:23 +00:00
George Bosilca
13d2998d54 When the BTL TCP is trying to connect to a peer, output it's process name
in addition to all the information.

This commit was SVN r24534.
2011-03-16 20:20:14 +00:00
Mike Dubman
3204af2d36 * temporary fix for ib btl compilation with old ofed versions 1.3.x.
This commit was SVN r24533.
2011-03-16 17:53:51 +00:00
Ralph Castain
d5dfe05521 Remove stale code associated with OPAL_THREADS_HAVE_DIFFERENT_PIDS. In the past, we have supported the case of really, really old Linux kernels where threads have different pids. However, when we updated the event library, we didn't also update that support code. In addition, when we dropped progress thread support, we didn't remove areas of the code that could no longer be compiled (i.e., were protected by "if progress thread && if have different pids).
There was no compelling reason to support such old kernels. Accordingly, convert the test to print a nice error message indicating we no longer support old kernels (but indicate that earlier OMPI versions do) and error out. Remove all code that was protected by "if have different pids" since it can no longer be compiled.

This commit was SVN r24531.
2011-03-15 21:05:03 +00:00
Ralph Castain
7eede54b39 Solve a problem when cross-compiling for PPC32 - in this case, OPAL_HAVE_ATOMIC_CMPSET_64 is not set, but the code requires that the ADD_64 and SUB_64 values at least be defined.
This commit was SVN r24528.
2011-03-15 15:50:49 +00:00
Ralph Castain
a8c1a3b4ee Update platform file
This commit was SVN r24527.
2011-03-14 18:44:09 +00:00
Ralph Castain
de092af8ef Add a little more debug
This commit was SVN r24526.
2011-03-14 18:43:49 +00:00
Ralph Castain
ebabe9c83a Forgot that Terry wanted to control the vm launch with an mca param - set one up for that purpose
This commit was SVN r24525.
2011-03-13 00:46:42 +00:00
Ralph Castain
dc6f616599 Enable VM launch.
For some time, ORTE has had the ability to launch daemons on all nodes prior to launching an application. It has largely been used outside of the OMPI community, and so was never explicitly turned "on" inside OMPI releases. Nevertheless, the code has been there.

Allowing VM launches does not require ANY changes to existing PLM components. All that was required was to have orterun launch the daemons as a separate call to orte_plm.spawn -prior- to launching the applications. The rest of the VM support code resides in the rmaps framework:

(a) a check when asked to map a job to see if it is the daemon job, and

(b) a separate "setup_virtual_machine" mapper in the rmaps base that creates the required map so the PLM's will do the right thing.

In order to support those users who have no RM allocation but like to give the allocation in the form of a -host or -hostfile argument to their application, there is a little more code in orterun and the setup_virtual_machine mapper to capture information passed in that manner.

This has been tested with rsh and slurm environments, and, since there is nothing environment-specific in the implementation, should work in others as well - but needs to be proven.

This commit was SVN r24524.
2011-03-12 22:50:53 +00:00
Ralph Castain
80265b472e Avoid direct reference of pointer_array elements
This commit was SVN r24523.
2011-03-12 20:18:51 +00:00
Ralph Castain
3e2c836e51 Initial cut at integrating new mapper capabilities into comm_spawn. Support specification of a mapper to use, and setting of npernode value. Other info flags can also be defined, but these will serve as examples for now - someone who wants to extend this to all the available mapping controls is welcome to do so.
This commit was SVN r24522.
2011-03-12 15:39:56 +00:00
Ralph Castain
df82e4cd36 Plug a memory leak
This commit was SVN r24521.
2011-03-12 15:37:33 +00:00
Ralph Castain
1297acde13 George raised some valid concerns about the extensibility of the revised rmaps framework. Address those by:
1. removing the enum of mapper values

2. change the req_mapper and last_mapper fields to char* so they can hold the component name instead of a mapper flag

3. revise the selection logic in the mapper components to reflect the change. Components now look for their name in the req_mapper field, or to see if other criteria (e.g., npernode) are set that mandate their doing the mapping

Several MCA params resided in the rmaps base for historical reasons - they have been in the base since at least the original 1.2 release (and perhaps earlier). However, George correctly pointed out that they really should reside in their respective components. Accordingly, move them to the components, but register synonyms to the old names to avoid breaking backward compatibility.

These revisions retain the current functionality of allowing comm_spawn'd jobs to use different mappers than the original job, and for the errmgr to utilize the resilient mapper to recover processes regardless of how they were originally mapped.

Given the large number of possible combinations, I am sure that someone will find a corner-case combination of values and selection criteria that cause either no mapper to be selected, or one other than the intended to be used. No one can test all the ways people will use this system, so I expect debugging to continue for awhile.

The ability of comm_spawn'd jobs to exploit this functionality relies on changes to the orte_dpm component - this will be committed separately.

This commit was SVN r24520.
2011-03-12 05:30:09 +00:00
Samuel Gutierrez
0867454a06 Fixes CID #1665.
This commit was SVN r24519.
2011-03-12 03:41:49 +00:00
Samuel Gutierrez
830c7c66dc fixes CID #1667
This commit was SVN r24518.
2011-03-12 03:09:01 +00:00
Samuel Gutierrez
5cff21842a a friday night in sf, nm. fixes CID 1666.
This commit was SVN r24517.
2011-03-12 02:39:31 +00:00
Ralph Castain
e6a76cc923 Fixes CID #1954
This commit was SVN r24516.
2011-03-11 23:00:27 +00:00
Ralph Castain
45aacd30ab Add prefix for PPC hosts
This commit was SVN r24515.
2011-03-11 22:58:51 +00:00
Ralph Castain
2ccd514b9a Add version string to app
This commit was SVN r24514.
2011-03-11 20:38:37 +00:00
Samuel Gutierrez
2a2319d23a when orte_timing is enabled, always record daemon launch start time before starting the real work.
This commit was SVN r24513.
2011-03-11 00:09:23 +00:00