1
1
Граф коммитов

9889 Коммитов

Автор SHA1 Сообщение Дата
Brian Barrett
84d1512fba Add the potential for doing some basic error checking on mutexes during
single threaded builds.  In its default configuration, all this does
is ensure that there's at least a good chance of threads building
based on non-threaded development (since the variable names will be
checked).  There is also code to make sure that a "mutex" is never
"double locked" when using the conditional macro mutex operations.
This is off by default because there are a number of places in both
ORTE and OMPI where this alarm spews mega bytes of errors on a
simple test.  So we have some work to do on our path towards
thread support.

Also removed the macro versions of the non-conditional thread locks,
as the only places they were used, the author of the code intended
to use the conditional thread locks.  So now you have upper-case
macros for conditional thread locks and lowercase functions for
non-conditional locks.  Simple, right? :).

This commit was SVN r15011.
2007-06-12 16:25:26 +00:00
Ralph Castain
4e8081ed1e Cleanup a now unnecessary variable
This commit was SVN r15010.
2007-06-12 14:23:33 +00:00
Tim Prins
1467558157 Cleanup a couple warnings.
Update svn:ignore

This commit was SVN r15009.
2007-06-12 14:11:06 +00:00
Ralph Castain
85df3bd92f Bring in the generalized xcast communication system along with the correspondingly revised orted launch. I will send a message out to developers explaining the basic changes. In brief:
1. generalize orte_rml.xcast to become a general broadcast-like messaging system. Messages can now be sent to any tag on the daemons or processes. Note that any message sent via xcast will be delivered to ALL processes in the specified job - you don't get to pick and choose. At a later date, we will introduce an augmented capability that will use the daemons as relays, but will allow you to send to a specified array of process names.

2. extended orte_rml.xcast so it supports more scalable message routing methodologies. At the moment, we support three: (a) direct, which sends the message directly to all recipients; (b) linear, which sends the message to the local daemon on each node, which then relays it to its own local procs; and (b) binomial, which sends the message via a binomial algo across all the daemons, each of which then relays to its own local procs. The crossover points between the algos are adjustable via MCA param, or you can simply demand that a specific algo be used.

3. orteds no longer exhibit two types of behavior: bootproxy or VM. Orteds now always behave like they are part of a virtual machine - they simply launch a job if mpirun tells them to do so. This is another step towards creating an "orteboot" functionality, but also provided a clean system for supporting message relaying.

Note one major impact of this commit: multiple daemons on a node cannot be supported any longer! Only a single daemon/node is now allowed.

This commit is known to break support for the following environments: POE, Xgrid, Xcpu, Windows. It has been tested on rsh, SLURM, and Bproc. Modifications for TM support have been made but could not be verified due to machine problems at LANL. Modifications for SGE have been made but could not be verified. The developers for the non-verified environments will be separately notified along with suggestions on how to fix the problems.

This commit was SVN r15007.
2007-06-12 13:28:54 +00:00
Galen Shipman
8e7cce813e don't update MPI_ERROR
This commit was SVN r15004.
2007-06-11 21:40:29 +00:00
Galen Shipman
406b05bdc3 update copyright..
This commit was SVN r15003.
2007-06-11 21:17:49 +00:00
Galen Shipman
798cc2c5b8 handle MPI_STATUS_IGNORE in iprobe for the MTLs
This commit was SVN r15002.
2007-06-11 20:19:31 +00:00
Brian Barrett
27ad954265 Fix a couple of problems with the way we were using orte_process_name_t
structures in the system.  Instead of using memcmp, use the ns function.
This won't cause a problem as long as all three elements of the name are
ints, but if they have different sizes, alignment and padding rules
can cause memcmp() to compare padding space, which rarely holds a sane
value.

This commit was SVN r14998.
2007-06-11 19:12:11 +00:00
Brian Barrett
1d11cc4b2d Fix mis-declared variable type
This commit was SVN r14994.
2007-06-11 16:48:35 +00:00
Shiqing Fan
d9fa58dc33 Add two more arguments to call. The definition of the function has been modified with 2 additional arguments.
This commit was SVN r14990.
2007-06-11 14:27:36 +00:00
Jeff Squyres
f72b52bb1d s/ifdef/if/ fro OMPI_C_HAVE_VISIBILITY to enable static builds.
This commit was SVN r14985.
2007-06-11 13:20:56 +00:00
Jeff Squyres
b704ff9f4e Fix typo that accidentally always resulted in a "true" result.
This commit was SVN r14984.
2007-06-11 12:52:56 +00:00
Jeff Squyres
6d8de7f1a9 Remove some unnecessary kruft (this -I flag is now directly in the one
Makefile.am that needs it).

This commit was SVN r14983.
2007-06-11 11:19:26 +00:00
Rich Graham
ad9941005b romio configuration
This commit was SVN r14982.
2007-06-11 02:10:13 +00:00
Jeff Squyres
1ed906a78b * Forgot to update the iof null component when we revamped the IOF
framework.  Updated pointers to match current definitions.
 * Trimmed some dead wood while I was at it:
   * No need for component close function that does nothing
   * Use BEGIN/END_C_DECLS
   * Use recent MCA param register function
   * Ditch MCA param orte_iof_debug (it wasn't used anywhere)
   * Use MCA param orte_iof_override properly in the code (i.e., look
     up the value once and use the cached value later)

This commit was SVN r14981.
2007-06-11 00:58:21 +00:00
Jeff Squyres
9fb2e807a9 Remove some unnecessary code, probably dating back to before we had
generalized component include/exclude infrastructure.  This commit
removes the oob_base_include and oob_base_exclude MCA params because
they have long-since been handled by the "oob" MCA parameter in the
MCA base.

This commit was SVN r14979.
2007-06-10 14:16:05 +00:00
Tim Mattox
d3c01a6978 The openib compiler warning was not in a 1.2.x release... removing NEWS entry.
This commit was SVN r14977.
2007-06-09 13:16:19 +00:00
Tim Mattox
bcf9fb6fda Add two entries to the NEWS file's 1.2.3 section.
This commit was SVN r14975.
2007-06-09 13:04:35 +00:00
Jeff Squyres
c64fc30b41 Add note about removing sysfsutils dependency.
This commit was SVN r14972.
2007-06-09 02:41:43 +00:00
Jeff Squyres
36679de8d8 Fixes trac:1045.
libsysfs headers are required for libibverbs v1.0 (i.e., OFED 1.0 and
OFED 1.1), meaning that <infiniband/verbs.h> would #include
<sysfs/libsysfs.h>.  Hence, if the libsysfs headers did not exist on a
system, including <verbs.h> would fail.  

With older versions of Autoconf, we would simply test for the
''presence'' of the <infinband/verbs.h> and not actually try to
''use'' it.  This could leave OMPI in a weird situation on systems
that did not have the sysfs headers installed: configure would
complete successfully, but the build of the openib btl would fail.
Some users complained, thinking that there was a real compile error in
the OMPI code base.

Hence, we decided that it would be better to AC_CHECK_HEADER for the
sysfs header files in configure.  If the sysfs header files were not
found, configure would abort.  Users generally understand when
configure aborts, and know how to read the output and fix the
underlying problem; it was ''much'' more obvious than having the OMPI
build fail for nebulous reasons much later.

Note that we also checked for / added -lsysfs, but that wasn't
necessary because libibverbs already run-time linked to it (i.e.,
libibverbs couldn't have been installed if the sysfs libraries weren't
installed).

However, there are now two reasons why the check for sysfs's header
files is no longer necessary:

 * Newer versions of Autoconf are now used for OMPI tarballs that
   check for both the presence '''and''' usability of header files.
   Hence, AC_CHECK_HEADER for <infiniband/verbs.h> will actually try
   to ''use'' it, so if the sysfs header files are not installed,
   AC_CHECK_HEADER will (rightfully) fail.
 * libibverbs v1.1 (i.e., OFED 1.2 and beyond) does not require
   libsysfs at all (headers or libraries).  

When checking for the sysfs header files, OMPI's configure ''forces''
you to have sysfs installed, even though it may not be needed (e.g.,
libibverbs v1.1 and beyond).  Clearly, this is not good (especially
since the sysfs software package is now deprecated, and some Linux
distros no longer install it by default).

So this commit simply removes the check for the sysfs header files and
libraries, allowing OMPI to be build on systems with libibverbs >=1.1 that
do not have sysfs installed.

For systems with libibverbs 1.0, if they do not have the sysfs headers
installed, we'll still fail AC_CHECK_HEADER and therefore still fail
configure properly.  I expanded the warning message to say that if
libibverbs 1.0 is being used, check to ensure that sysfs is installed,
yadda yadda yadda.

This commit was SVN r14971.

The following Trac tickets were found above:
  Ticket 1045 --> https://svn.open-mpi.org/trac/ompi/ticket/1045
2007-06-08 23:34:05 +00:00
Jeff Squyres
6710f202d9 Add note about IOF/stdin fixes.
This commit was SVN r14970.
2007-06-08 23:15:42 +00:00
Jeff Squyres
4f3a11b4db Fixes trac:967.
A bunch of fixes from the /tmp/iof-fixes branch that fix up ''some''
(but not ''all'') of the problems that we have seen with iof:

 * Reading very large files via stdin redirected to orteun (Sun saw
   this)
 * Reading a little bit of a large file redirected to orterun's stdin
   and then either closing stdin or exiting the process

The Big Change was to make the proxy iof (the one running in non-HNP
orteds) send back a "I'm closing the stream" ACK back to the service
iof.  This tells the HNP that there will be nothing more coming from
that peer, and therefore the iof forward should be removed.

Many other minor cleanups/fixes, terminology changes, and
documentation additions are included in this commit as well.  However,
there are still some pretty big outstanding issues with IOF that are
not addressed either by #967 or this commit.  A few examples:

 * IOF was designed to allow multiple subscribers to a single stream.
   We're not entirely sure that this works (for one thing, there is
   nothing in the ORTE/OMPI code base that uses this functionality).
 * There are also resources leaked when processes/jobs exit (per
   Ralph's first comment on this ticket).  
 * There is no feedback to close orterun's stdin when all subscribers
   to the corresponding stream have closed stdin.

This commit was SVN r14967.

The following Trac tickets were found above:
  Ticket 967 --> https://svn.open-mpi.org/trac/ompi/ticket/967
2007-06-08 22:59:31 +00:00
George Bosilca
e2dd0a50fc A better version alowing for multi-rails or clusters of clusters. A lot of cleanups.
This commit was SVN r14963.
2007-06-08 20:37:20 +00:00
George Bosilca
c66cf32ee2 Cleaning up. Removing all unused variables and fields in the MX BTL and
component structures.

This commit was SVN r14957.
2007-06-07 21:02:18 +00:00
George Bosilca
5d6c958066 Enable the MTLs to be compiled in a visibility featured environment.
This commit was SVN r14955.
2007-06-07 20:14:53 +00:00
Jeff Squyres
af0d875302 Fix linux component for static builds.
This commit was SVN r14952.
2007-06-07 12:47:40 +00:00
Gleb Natapov
423f404c34 Shut up compiler warning. Ugly, but I can see better way except changing
converter to use uint64_t(ssize_t?) for offset.

This commit was SVN r14950.
2007-06-07 11:33:28 +00:00
Gleb Natapov
9f9b64db4e Revert r14947 as this doesn't solve the problem.
This commit was SVN r14949.

The following SVN revision numbers were found above:
  r14947 --> open-mpi/ompi@5b9fe28e3f
2007-06-07 11:24:24 +00:00
Gleb Natapov
5b9fe28e3f Fix warning on 32bit systems.
This commit was SVN r14947.
2007-06-07 08:57:34 +00:00
Jeff Squyres
dc88b79a9c Add bullet about PLPA addition.
This commit was SVN r14945.
2007-06-07 01:01:05 +00:00
Jeff Squyres
f3ee5fc3ec This component should be ok now.
This commit was SVN r14944.
2007-06-07 01:00:03 +00:00
Jeff Squyres
84e3a02064 Fix for not being able to build on systems other than Linux: move
AM_CONDITIONAL's outside of conditional logic so that they are always
executed.

This commit was SVN r14943.
2007-06-07 00:59:39 +00:00
Tim Mattox
eee0c31a73 Add one entry to the NEWS file's 1.2.3 section.
This commit was SVN r14941.
2007-06-07 00:58:31 +00:00
George Bosilca
976bad3ae7 The updated m4 file for detecting MX extensions. They are used to retrieve
the mapper MAC.

This commit was SVN r14938.
2007-06-07 00:44:47 +00:00
Tim Prins
06bf4c3f3b fix some printf warnings
This commit was SVN r14934.
2007-06-06 22:37:26 +00:00
Tim Prins
31a3430c85 Fix threaded builds broken by r14914
This commit was SVN r14933.

The following SVN revision numbers were found above:
  r14914 --> open-mpi/ompi@983fd3432a
2007-06-06 22:30:34 +00:00
George Bosilca
6a5e039466 Allow smart connection to be setup. Each peer now has attached to it thea unique
id based on the last half of the mapper MAC. This allow us to figure out how
to connect peers. This allow the MX BTL to be used in a cluster of cluster 
configuration where each cluster have MX internally as well as on a multi
rail MX system.

This commit was SVN r14932.
2007-06-06 21:42:11 +00:00
Ralph Castain
e0e4163f53 Remove pithy comment
This commit was SVN r14930.
2007-06-06 20:26:52 +00:00
George Bosilca
3b7f3e5565 Keep the unknown shell string.
This commit was SVN r14929.
2007-06-06 20:24:42 +00:00
George Bosilca
29dd535c01 Remove all references to the orte_bitmap as well as the files.
This commit was SVN r14928.
2007-06-06 20:24:07 +00:00
George Bosilca
fbb46f0ee7 A faster search without the bitmap. Remove all references to the orte_bitmap.
This commit was SVN r14926.
2007-06-06 20:23:14 +00:00
George Bosilca
24eae5c1ec We have a goto label for cleanup so make sure we always use it. This way we insure
that the lock are correctly released in all cases.

This commit was SVN r14925.
2007-06-06 20:20:52 +00:00
George Bosilca
28c9d0758b These functions are supposed to be static so there is no reason to have them declared in the header file.
This commit was SVN r14924.
2007-06-06 20:18:37 +00:00
George Bosilca
b047ed75d7 Don't forget to free the temporary buffer.
This commit was SVN r14923.
2007-06-06 20:17:27 +00:00
Galen Shipman
5340f5e320 Try to cleanup the flow control logic a bit
Renamed a few variables 
Inialize the reserve receive buffers to 1, prior to this they were initialized
to zero. 

This commit was SVN r14919.
2007-06-06 18:51:09 +00:00
Jeff Squyres
fdef72cf62 This component seems to be working now; removing the .ompi_ignore.
This commit was SVN r14918.
2007-06-06 18:43:46 +00:00
Jeff Squyres
2e0b1b442f * Fix up some version numbers
* Re-add module finalize function support

This commit was SVN r14917.
2007-06-06 18:36:04 +00:00
Ralph Castain
ea0c03fd7a Revert out r14910. Turns out that the GPR *has* to be able to deal with NULL data values. We fixed this a long time ago on the "put" side, but never dealt with it for "get" - hence, we could "put" ORTE_UNDEF'd attributes in a mapping policy, but couldn't retrieve them. This is why you only encountered the error on comm_spawn and not during the original launch of a job.
This correctly repairs the problem by enabling the GPR's "get" function to correctly handle NULL data values.

This commit was SVN r14916.

The following SVN revision numbers were found above:
  r14910 --> open-mpi/ompi@0757467d77
2007-06-06 18:34:54 +00:00
Jeff Squyres
d8b06a2eff Bump framework version number up to 1.1.0, therefore [mostly]
re-enabling compilation of this component.

However, it still won't compile because this component provides a
module finalize function which apparently somehow got dropped from the
paffinity base.  Support for the paffinity module finalize function
needs to be re-added.

This commit was SVN r14915.
2007-06-06 17:46:04 +00:00
Ralph Castain
983fd3432a Fix singleton comm_spawn. Ensure that singleton's start the RML receive function so they can receive RML updates during xconnect procedures once any comm_spawn'd children start. Since singleton's only use the RMGR/URM component, update that component to also hold us until xconnect is completed (if it is invoked) before returning to the caller.
This commit was SVN r14914.
2007-06-06 17:39:23 +00:00