1
1
Граф коммитов

1008 Коммитов

Автор SHA1 Сообщение Дата
Terry Dontje
ef7ac86929 created opal_version_string and orte_version_string to match the ompi changes
made in r18345 for ompi_version_string.  This was done per request from Jeff 
Squyres to maintain consistency and to remove some warnings caused by the 
non-use of some static const char.

This commit was SVN r18461.

The following SVN revision numbers were found above:
  r18345 --> open-mpi/ompi@8dd0421015
2008-05-20 12:13:19 +00:00
Jeff Squyres
ea1582856f Clarify some messages, move AC_ARG_WITH outside of the conditional
This commit was SVN r18459.
2008-05-19 23:13:31 +00:00
Jeff Squyres
d12b21e21b Ensure that if an error occurs, we actually return that error rather
than an undefined value (which could be 0/OPAL_SUCCESS).

This commit was SVN r18452.
2008-05-19 11:57:44 +00:00
Terry Dontje
517abf9b09 This commit fixes trac:1288.
This commit was SVN r18441.

The following Trac tickets were found above:
  Ticket 1288 --> https://svn.open-mpi.org/trac/ompi/ticket/1288
2008-05-15 17:40:08 +00:00
Jeff Squyres
fb17097de4 Make ompi_info correctly display "filter" components
This commit was SVN r18435.
2008-05-13 20:56:20 +00:00
Jeff Squyres
e7ecd56bd2 This commit represents a bunch of work on a Mercurial side branch. As
such, the commit message back to the master SVN repository is fairly
long.

= ORTE Job-Level Output Messages =

Add two new interfaces that should be used for all new code throughout
the ORTE and OMPI layers (we already make the search-and-replace on
the existing ORTE / OMPI layers):

 * orte_output(): (and corresponding friends ORTE_OUTPUT,
   orte_output_verbose, etc.)  This function sends the output directly
   to the HNP for processing as part of a job-specific output
   channel.  It supports all the same outputs as opal_output()
   (syslog, file, stdout, stderr), but for stdout/stderr, the output
   is sent to the HNP for processing and output.  More on this below.
 * orte_show_help(): This function is a drop-in-replacement for
   opal_show_help(), with two differences in functionality:
   1. the rendered text help message output is sent to the HNP for
      display (rather than outputting directly into the process' stderr
      stream)
   1. the HNP detects duplicate help messages and does not display them
      (so that you don't see the same error message N times, once from
      each of your N MPI processes); instead, it counts "new" instances
      of the help message and displays a message every ~5 seconds when
      there are new ones ("I got X new copies of the help message...")

opal_show_help and opal_output still exist, but they only output in
the current process.  The intent for the new orte_* functions is that
they can apply job-level intelligence to the output.  As such, we
recommend that all new ORTE and OMPI code use the new orte_*
functions, not thei opal_* functions.

=== New code ===

For ORTE and OMPI programmers, here's what you need to do differently
in new code:

 * Do not include opal/util/show_help.h or opal/util/output.h.
   Instead, include orte/util/output.h (this one header file has
   declarations for both the orte_output() series of functions and
   orte_show_help()).
 * Effectively s/opal_output/orte_output/gi throughout your code.
   Note that orte_output_open() takes a slightly different argument
   list (as a way to pass data to the filtering stream -- see below),
   so you if explicitly call opal_output_open(), you'll need to
   slightly adapt to the new signature of orte_output_open().
 * Literally s/opal_show_help/orte_show_help/.  The function signature
   is identical.

=== Notes ===

 * orte_output'ing to stream 0 will do similar to what
   opal_output'ing did, so leaving a hard-coded "0" as the first
   argument is safe.
 * For systems that do not use ORTE's RML or the HNP, the effect of
   orte_output_* and orte_show_help will be identical to their opal
   counterparts (the additional information passed to
   orte_output_open() will be lost!).  Indeed, the orte_* functions
   simply become trivial wrappers to their opal_* counterparts.  Note
   that we have not tested this; the code is simple but it is quite
   possible that we mucked something up.

= Filter Framework =

Messages sent view the new orte_* functions described above and
messages output via the IOF on the HNP will now optionally be passed
through a new "filter" framework before being output to
stdout/stderr.  The "filter" OPAL MCA framework is intended to allow
preprocessing to messages before they are sent to their final
destinations.  The first component that was written in the filter
framework was to create an XML stream, segregating all the messages
into different XML tags, etc.  This will allow 3rd party tools to read
the stdout/stderr from the HNP and be able to know exactly what each
text message is (e.g., a help message, another OMPI infrastructure
message, stdout from the user process, stderr from the user process,
etc.).

Filtering is not active by default.  Filter components must be
specifically requested, such as:

{{{
$ mpirun --mca filter xml ...
}}}

There can only be one filter component active.

= New MCA Parameters =

The new functionality described above introduces two new MCA
parameters:

 * '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that
   help messages will be aggregated, as described above.  If set to 0,
   all help messages will be displayed, even if they are duplicates
   (i.e., the original behavior).
 * '''orte_base_show_output_recursions''': An MCA parameter to help
   debug one of the known issues, described below.  It is likely that
   this MCA parameter will disappear before v1.3 final.

= Known Issues =

 * The XML filter component is not complete.  The current output from
   this component is preliminary and not real XML.  A bit more work
   needs to be done to configure.m4 search for an appropriate XML
   library/link it in/use it at run time.
 * There are possible recursion loops in the orte_output() and
   orte_show_help() functions -- e.g., if RML send calls orte_output()
   or orte_show_help().  We have some ideas how to fix these, but
   figured that it was ok to commit before feature freeze with known
   issues.  The code currently contains sub-optimal workarounds so
   that this will not be a problem, but it would be good to actually
   solve the problem rather than have hackish workarounds before v1.3 final.

This commit was SVN r18434.
2008-05-13 20:00:55 +00:00
Josh Hursey
c70ba283b8 Fix a warning, and some return codes.
Thanks to Jeff for pointing this out to me.

This commit was SVN r18430.
2008-05-13 13:10:16 +00:00
Josh Hursey
4236255700 Add the framework name to the verbose message for improved debugging.
Also set the 'best_priority' to the smallest 32 bit integer possible so negaive priority component can be selected if they are the highest ranking component available.

This commit was SVN r18427.
2008-05-12 14:07:37 +00:00
Rainer Keller
b0cbeb0b41 - Add detection of __attribute__((hot)) and __attribute__((cold))
to allow explicit grouping of hot functions into similar code
   sections upon link-time. Should decrease TLB misses (iff the code-
   section is really too large)...
   Candidates for __opal_attribute_hot__ are MPI_Isend MPI_Irecv,
   MPI_Wait, MPI_Waitall
   Candidates for __opal_attribute_cold__ are MPI_Init, MPI_Finalize and
   MPI_Abort...

This commit was SVN r18421.
2008-05-10 10:38:51 +00:00
Josh Hursey
9b0cd5b02a Remove the 'include' check from mca_base_select. include/exclude is handled by the mca_base_open functionality and it is redundant (and wrong) to check this in the select function.
Thanks to Pak Lui for bringing this to my attention.

This commit was SVN r18418.
2008-05-08 23:41:07 +00:00
Josh Hursey
da2f1c58e2 Some checkpoint/restart cleanup.
* Remove the opal_only option. This was suffering from bit rot, and no one uses it. It can be added back fairly easily if wanted.
 * Cleanup metadata interactions at the local level.
 * Touch up some of the INC funcitonality (fix typos and a minor ordering issue)

This commit was SVN r18416.
2008-05-08 18:47:47 +00:00
Josh Hursey
8739edc580 Fix a couple of missing OPAL_DECLSPEC missing from r18407
This commit was SVN r18415.

The following SVN revision numbers were found above:
  r18407 --> open-mpi/ompi@7c7b9b0486
2008-05-08 18:44:23 +00:00
George Bosilca
fe495e429a Completely remove the kqueue support on MAC OS X. Remove the test
from kqueue that try to detect if kqueue might works with ptys.

This commit was SVN r18411.
2008-05-08 02:33:23 +00:00
Ralph Castain
7c7b9b0486 Do a little cleanup on the opal graph class and opal carto framework to conform to OMPI naming conventions and avoid potential conflict with user applications - no change in functionality, passes carto test program
This commit was SVN r18407.
2008-05-07 19:33:49 +00:00
Josh Hursey
9971bc9d95 Merge in the mca_base_select changes per RFC:
http://www.open-mpi.org/community/lists/devel/2008/04/3779.php

{{{
svn merge -r 18276:18380 https://svn.open-mpi.org/svn/ompi/tmp-public/jjh-mca-play .
}}}

Any components not in the trunk, but in one of the effected frameworks *must* be
updated. Contact the list, look at the RFC, or look at the diff for how to do this.

Sorry for the early commit of this, but I wanted to get it in today (per RFC) and
didn't know if I would have a chance later today.

This commit was SVN r18381.
2008-05-06 18:08:45 +00:00
Aurelien Bouteiller
c06620ad70 Add a const to the parameters of opal_dss_compare.
This commit was SVN r18374.
2008-05-05 19:12:01 +00:00
Brad Penoff
4f104ba5d1 Add header for FreeBSD.
This commit was SVN r18366.
2008-05-03 23:07:45 +00:00
George Bosilca
f5dfc005a4 Only check for /proc/cpuinfo if we are on a supported architecture.
This commit was SVN r18331.
2008-04-29 22:36:18 +00:00
George Bosilca
465f690f90 We need to force the compiler to preprocess these files as some of
them use #include. The standard way is to rename to file .S instead
of .s.

This commit was SVN r18290.
2008-04-24 21:40:40 +00:00
Josh Hursey
2c736873bb Fix a checkpoint/restart bug that causes a restarted application to occasionally throw a SIGSEGV or SIGPIPE due to invalid socket descriptors.
The problem was caused by a bad ordering between the restart of the ORTE level tcp connections (in the OOB - out-of-band communication) and the Open MPI level tcp connections (BTLs). Before this commit ORTE would shutdown and restart the OOB completely before the OMPI level restarted its tcp connections. What would happen is that a socket descriptor used by the OMPI level on checkpoint was assigned to the ORTE level on restart. But the OMPI level had no knowledge that the socket descriptor it was previously using has been recycled so it closed it on restart. This caused the ORTE level to break as the newly created socket descriptor was closed without its knowledge.

The fix is to have the OMPI level shutdown tcp connections, allow the ORTE level to restart, and then allow the OMPi level to restart its connections. This seems obvious, and I'm surprised that this bug has not cropped up sooner. I'm confident that this specific problem has been fixed with this commit.

Thanks to Eric Roman and Tamer El Sayed for their help in identifying this problem, and patience while I was fixing it.

 * Add a new state {{{OPAL_CRS_RESTART_PRE}}}. This state identifies when we are on the down slope of the INC (finalize-like) which is useful when you want to close, but not reopen a component set for fear of interfering with a lower level.
 * Use this new state in OMPI level coordination. Here we want to make sure to play well with both the OMPI/BTL/TCP and ORTE/OOB/TCP components.
 * Update ft_event functions in PML and BML to handle the new restart state.
 * Add an additional flag to the error output in OOB/TCP so we can see what the socket descriptor was on failure as this can be helpful in debugging.

This commit was SVN r18276.
2008-04-24 17:54:22 +00:00
Shiqing Fan
4a9787979e When valgrind is not available or it is deselected (--without-valgrind, --with-valgrind=no), don't compile this component, continue without abortion.
This commit was SVN r18243.
2008-04-23 11:50:42 +00:00
Josh Hursey
cc83d41ad9 Merge in tmp/jjh-scratch
{{{
 svn merge -r 18218:18240 https://svn.open-mpi.org/svn/ompi/tmp/jjh-scratch .
}}}

Contains:
 * Primarily a fix for a user reported problem where a cached file descriptor is causing a SIGPIPE on restart.
 * Cleanup some small memory leaks from using mca_base_param_env_var() - Thanks Jeff
 * Cleanup ORTE FT tool compilation in non-FT builds - Thanks Tim P.
 * Cleanup mpi interface with missplaced {{{OPAL_CR_ENTER_LIBRARY}}} - Thanks Terry
 * Some other sundry cleanup items all dealing with C/R functionality in the trunk.

This commit was SVN r18241.
2008-04-23 00:17:12 +00:00
Jeff Squyres
db2695ccab Make the symbols be visible.
This commit was SVN r18201.
2008-04-18 00:26:17 +00:00
Ralph Castain
fa082cafa9 Shift the architecture calculation from the ompi/datatype engine to the opal/util area. This allows us to compute the architecture earlier in the launch and communicate it outside of the modex.
Note: this is an early preliminary step in the movement of portions of the datatype engine to the opal layer.

This commit was SVN r18198.
2008-04-17 20:43:56 +00:00
George Bosilca
01148b77dc Generate the help message for the available event ops. Now the list only
contains the one that are compiled on the current ompi.

This commit was SVN r18196.
2008-04-17 18:16:54 +00:00
Ralph Castain
e7487ad533 Implement the seq rmaps module that sequentially maps process ranks to a list hosts in a hostfile.
Restore the "do-not-launch" functionality so users can test a mapping without launching it.

Add a "do-not-resolve" cmd line flag to mpirun so the opal/util/if.c code does not attempt to resolve network addresses, thus enabling a user to test a hostfile mapping without hanging on network resolve requests.

Add a function to hostfile to generate an ordered list of host names from a hostfile

This commit was SVN r18190.
2008-04-17 13:50:59 +00:00
Shiqing Fan
49fbc4e795 These functions should always have a return value.
This commit was SVN r18174.
2008-04-16 13:54:15 +00:00
George Bosilca
b359d84661 Use the correct prefix.
This commit was SVN r18048.
2008-03-31 21:42:59 +00:00
George Bosilca
be2454e0c5 Default the temporary directory to /tmp if no special environment
variables are set.

This commit was SVN r18046.
2008-03-31 20:15:49 +00:00
George Bosilca
ee784b601e For consistency reasons always use opal_home_directory and
opal_tmp_directory.

This commit was SVN r18043.
2008-03-31 18:13:41 +00:00
George Bosilca
60111ce66d Few less warnings.
This commit was SVN r18025.
2008-03-30 19:06:49 +00:00
Lenny Verkhovsky
fa6a084d33 added opal/mca/paffinity/base/paffinity_base_service.c with paffinity functions
This commit was SVN r18020.
2008-03-30 12:01:02 +00:00
Lenny Verkhovsky
7e45d7e134 Few updates due to RMAPS rank_file component changes
1. applied prefix rule to functions and variables of RMAPS rank_file component
2. cleaned ompi_mpi_init.c from paffinity code
3. paffinity code moved to new opal/mca/paffinity/base/paffinity_base_service.c file
4. added opal_paffinity_slot_list mca parameter

This commit was SVN r18019.
2008-03-30 11:52:11 +00:00
Shiqing Fan
f82092566f We don't have inttypes.h on Windows, and some types are redefined.
This commit was SVN r18010.
2008-03-28 17:33:54 +00:00
Shiqing Fan
aaf2730fab Winsock2.h also has definition for timeval and so on, it conflicts with our own definitions.
This commit was SVN r18009.
2008-03-28 17:30:33 +00:00
Jeff Squyres
6ea36061cf Fix typo found by Pak.
This commit was SVN r18000.
2008-03-27 23:04:17 +00:00
Jeff Squyres
c06f7c3992 Fixes trac:1254: ensure that evport.c is in the distribution tarball.
This commit was SVN r17989.

The following Trac tickets were found above:
  Ticket 1254 --> https://svn.open-mpi.org/trac/ompi/ticket/1254
2008-03-27 16:40:55 +00:00
Sharon Melamed
afa98f92e8 Changed the for loop to a while loop so I could
release the edge without conflicting with get next.

This commit was SVN r17979.
2008-03-26 14:45:45 +00:00
Jeff Squyres
33c09b30c2 Patch from George: ensure that we don't overwrite timer_linux_happy
improperly when checking the host type.

This commit was SVN r17975.
2008-03-26 11:22:57 +00:00
George Bosilca
4a5431ef11 Remove the event-config.h file, it is never used.
Correct the include logic that protect the headers. It's amazing
that this didn't bite us yet ...

This commit was SVN r17971.
2008-03-26 03:33:43 +00:00
George Bosilca
64bc580c78 Use evutil_timercmp instead of timercmp to take advantage of the
fallback installed in evutil.h.

This commit was SVN r17968.
2008-03-25 23:54:30 +00:00
George Bosilca
2e46a53b0a Avoid strcpy if its not really required.
This commit was SVN r17962.
2008-03-25 22:40:20 +00:00
George Bosilca
028c7391d3 Coverty fix: Replace strcpy by strncpy.
This commit was SVN r17961.
2008-03-25 22:39:24 +00:00
George Bosilca
6717b2dc75 Add the Solaris evport to the list of available event subsystems.
This commit was SVN r17958.
2008-03-25 18:00:40 +00:00
Jeff Squyres
763218e754 Fix #1253: default libevent to use select/poll and only use the other
mechanisms (such as epoll) if someone (ompi_mpi_init()) requests
otherwise.  See big comment in opal/event/event.c for a full
explanation.

This commit was SVN r17956.
2008-03-25 17:18:17 +00:00
George Bosilca
03c10e2a85 Add the Solaris evport support.
This commit was SVN r17954.
2008-03-25 16:44:27 +00:00
George Bosilca
9222ea0d0a Cast the uintptr_t to int when playing with fds.
This commit was SVN r17925.
2008-03-23 18:16:29 +00:00
Jeff Squyres
8239e40607 Add header for OS X.
This commit was SVN r17924.
2008-03-23 12:57:57 +00:00
Jeff Squyres
314ab2c6e7 Update internal libevent to upstream (v1.4.2-rc + OMPI changes).
Greatly reduce the number of "foo" -> "opal_foo" symbol renames in the
libevent source, and instead greatly expand the event_rename.h file
that uses preprocessor macros to make all public symbols be
"opal_foo".

This commit was SVN r17923.
2008-03-23 12:33:04 +00:00
Jeff Squyres
dee561d29e Per recent off-list discussions about the build system, I have done
some cleanups and standardizations in the various */tools/*/ 
Makefile.am files.  This commit:

 * Somewhat simplify the tool Makefile.am's 
 * Makes the tool Makefile.am's consistent with each other (do similar
   actions in similar ways)
 * Update the tool Makefile.am's to remove old kruft that was required
   by older versions of AM (trunk requires AM >=1.10)

This commit was SVN r17921.
2008-03-22 02:04:05 +00:00
Jeff Squyres
05a7b1ed55 Remove svn:executable from these files.
This commit was SVN r17918.
2008-03-21 21:16:11 +00:00
Jeff Squyres
a4ec8a9d53 Spring cleaning -- no one is using this stuff; remove it from the tree.
This commit was SVN r17913.
2008-03-21 17:14:42 +00:00
Jeff Squyres
e0fb3957cb Patch from Brian:
* The opal_sys_timer_get_cycles() call was implemented for
   Sparc v9 using inline assembly, but not in the assembly files.
   This would only currently matter on Linux Sparc systems using
   a compiler that didn't support inline assembly (not many of
   those), but it should be there for completion.
 * The linux timer component would always build on non-Alpha
   platforms, rather than only building on platforms where
   opal_sys_timer_get_cycles() was implemented.  This would
   only matter on a very narrow set of platforms that we don't
   really support, but still, it could be more right.  We now
   only build the component on platforms where we have the
   assembly call to get the cycle counter.
 * Added a comment to opal/sys/timer.h to note that the linux
   timer component needed to be updated if another platform was
   added.

This should be harmless to commit.  It will only really change
behaviors on platforms we don't have assembly support for, which
currently won't make it through configure.  It really only matters
when (if?) we support atomic operations through libatomic_ops.

This commit was SVN r17887.
2008-03-20 00:29:36 +00:00
George Bosilca
3997639ec6 Hide what should be hidden, and expose the others. Plus some indentation.
This commit was SVN r17856.
2008-03-18 03:00:08 +00:00
Jeff Squyres
f443644bfe From Brian B.:
This commit lowers the priority of the darwin backtrace component
below that of the ''execinfo'' and ''stackprint'' components, which
will cause OS X Leopard to use the ''execinfo'' component.  execinfo
utilizes a public API for printing the stacktrace.  The ''darwin''
component uses some evil hacks and a not-so supported package from
Apple to print the stack trace.  

This commit was SVN r17840.
2008-03-17 13:39:25 +00:00
Jeff Squyres
9b18b0e9c6 Fix visibility symbols on OS X
This commit was SVN r17838.
2008-03-17 13:18:12 +00:00
George Bosilca
210631962c Add two convenience functions in order to make sure we get these
environment variables in a consistent manner. These functions
retrieve the user and the temporary directories (based on the
system).

This commit was SVN r17815.
2008-03-13 17:56:44 +00:00
Jon Mason
2e8a316ae6 opal_evtimer_initialized is missing the opening '('
This commit was SVN r17814.
2008-03-12 20:33:22 +00:00
Sharon Melamed
4a8e2a2648 Renove status check from carto initiation.
This commit was SVN r17812.
2008-03-12 08:55:28 +00:00
George Bosilca
4267f2b967 This symbol have to be visible.
This commit was SVN r17793.
2008-03-08 23:53:17 +00:00
Rainer Keller
32dcd9e551 - Adding #include <stdbool.h> with protection in r17488 and r17504
seemed to be the right thing(tm), but broke the Sun Studio C++
   compiler under Linux (ticket 747).

   This patch should allow inclusion into C and C++ from other header
   files without problems.

This commit was SVN r17792.

The following SVN revision numbers were found above:
  r17488 --> open-mpi/ompi@d53131f261
  r17504 --> open-mpi/ompi@b22e8e7567
2008-03-08 12:53:10 +00:00
Josh Hursey
aaff245271 A couple verbose additions. Poll the event engine while waiting for the
named pipe.

This commit was SVN r17787.
2008-03-07 21:10:14 +00:00
Jeff Squyres
b2ed2b95aa Fix filename so that the help file can be found.
This commit was SVN r17759.
2008-03-06 14:44:47 +00:00
Rolf vandeVaart
91af56db00 Fix a few typos so this compiles on Solaris. Remove some trailing spaces.
This commit was SVN r17746.
2008-03-05 20:16:00 +00:00
Aurelien Bouteiller
c280b81e40 Revert the last patch. Still some warning should be issued on ia32 architectures. Looking for a fix.
This commit was SVN r17745.
2008-03-05 17:20:11 +00:00
Josh Hursey
612ebdc2ac Cleanup some symbol visability issues.
This commit was SVN r17733.
2008-03-05 13:59:25 +00:00
Josh Hursey
3b4073e32c This commit fixes the checkpoint/restart functionality on the trunk. Included in this commit are:
* Extension to the ESS framework to support C/R
 * Fixed support for {{{snapc_base_establish_global_snapshot_dir}}}
 * Fixed FileM support
 * Misc. minor code modifications

There are some outstanding visability issues that I want to fix next.

This commit was SVN r17725.
2008-03-05 04:57:23 +00:00
Jeff Squyres
ea5c0cb4a2 Now that the nightly tarball has safely been made, let's try this
commit again.  Remove the svn:ignore from problematic directories and
try a merge from /tmp-public/plpa-merge-area2.

This commit was SVN r17718.
2008-03-05 02:45:15 +00:00
Jeff Squyres
8189fcc7d5 Back out r17702; it went very badly.
This commit was SVN r17704.

The following SVN revision numbers were found above:
  r17702 --> open-mpi/ompi@3df754ebd7
2008-03-05 00:42:39 +00:00
Jeff Squyres
3df754ebd7 Bring over PLPA v1.1 from /tmp-public/plpa-v1.1 branch.
This commit was SVN r17702.
2008-03-05 00:16:49 +00:00
Aurelien Bouteiller
284115208c Try to blindly solve warning about size_t printf format, as I can't reproduce the warning on my machines.
This commit was SVN r17701.
2008-03-04 22:30:35 +00:00
Tim Prins
824c298abf Move the carto finalize from the util finalize to the main finalize where it belongs. Otherwise, the modules are unloaded by the mca before we try to do carto_finalize, and bad things happen.
This commit was SVN r17665.
2008-02-29 12:49:04 +00:00
Tim Prins
84b2099fe8 Remove the now-unused orte_value_array. As this is the last 'class' split between orte and ompi, remove the big comment about the split in ompi_bitmap.
Also, update some properties (source files should not be executeable...), and remove a couple unneeded inclusions of orte_proc_table.h

This commit was SVN r17655.
2008-02-28 21:39:42 +00:00
Tim Prins
2e1bda6d23 Remove the now-unused arithmatic interface to the dss
This commit was SVN r17654.
2008-02-28 21:36:51 +00:00
Ralph Castain
8d819cf3d3 Move carto open/close/finalize to opal layer so that ORTE can get access to topo info. This will be used to support a topo grpcomm that optimizes communications in non-uniform topologies like RR.
This commit was SVN r17652.
2008-02-28 21:04:30 +00:00
Ralph Castain
5e6928d710 Cleanup recursions in ORTE caused by processing recv'd messages that can cause the system to take action resulting in receipt of another message.
Basically, the method employed here is to have a recv create a zero-time timer event that causes the event library to execute a function that processes the message once the recv returns. Thus, any action taken as a result of processing the message occur outside of a recv.

Created two new macros to assist:

ORTE_MESSAGE_EVENT: creates the zero-time event, passing info in a new orte_message_event_t object

ORTE_PROGRESSED_WAIT: while waiting for specified conditions, just calls progress so messages can be recv'd.

Also fixed the failed_launch function as we no longer block in the orted callback function. Updated the error messages to reflect revision. No change in API to this function, but PLM "owners" may want to check their internal error messages to avoid duplication and excessive output.

This has been tested on Mac, TM, and SLURM.

This commit was SVN r17647.
2008-02-28 19:58:32 +00:00
George Bosilca
9d421bea2a Replace all occurences of orte_pointer_array by opal_pointer_array. Remove the
implementation of orte_pointer_array.

This commit was SVN r17636.
2008-02-28 05:32:23 +00:00
George Bosilca
f256dd6010 Don't free the node2_name it is not yet set at this point.
This commit was SVN r17634.
2008-02-28 05:17:20 +00:00
Ralph Castain
d70e2e8c2b Merge the ORTE devel branch into the main trunk. Details of what this means will be circulated separately.
Remains to be tested to ensure everything came over cleanly, so please continue to withhold commits a little longer

This commit was SVN r17632.
2008-02-28 01:57:57 +00:00
Aurelien Bouteiller
6ea23283a8 Added a PRIsize_t constant to help printing size_t without having to cast them to long long explicitely everywhere.
This commit was SVN r17626.
2008-02-27 19:38:14 +00:00
Josh Hursey
5e0d17ec99 Forgot a case in which we should check is the checkpoint is ready during the
threaded CR builds. MTT caught this by running the IU FT CR test 'inflight'
which under certian timing scenarios will trigger this.

This commit was SVN r17538.
2008-02-21 13:34:27 +00:00
Josh Hursey
a169575ab2 A quick fix for opal only apps (really this time)
This commit was SVN r17537.
2008-02-20 22:33:42 +00:00
Josh Hursey
ad9fbf2a92 a fix for opal only apps
This commit was SVN r17536.
2008-02-20 21:17:08 +00:00
Josh Hursey
99144db970 Improve checkpoint/restart support by allowing a checkpoint to progress when the process is *not* in the MPI library. This involves creating a separate thread for polling for a checkpoint request. This thread is active when the MPI process is not in the MPI library, and paused when the MPI process is in the library.
Some MPI C interface files saw some spacing changes to conform to the coding standards of Open MPI.

Changed MPI C interface files to use {{{OPAL_CR_ENTER_LIBRARY()}}} and {{{OPAL_CR_EXIT_LIBRARY()}}} instead of just {{{OPAL_CR_TEST_CHECKPOINT_READY()}}}. This will allow the checkpoint/restart system more flexibility in how it is to behave.

Fixed the configure check for {{{--enable-ft-thread}}} so it has a know dependance on {{{--enable-mpi-thread}}} (and/or {{{--enable-progress-thread}}}).

Added a line for Checkpoint/Restart support to {{{ompi_info}}}.

Added some options to choose at runtime whether or not to use the checkpoint polling thread. By default, if the user asked for it to be compiled in, then it is used. But some users will want the ability to toggle its use at runtime.

There are still some places for improvement, but the feature works correctly. As always with Checkpoint/Restart, it is compiled out unless explicitly asked for at configure time. Further, if it was configured in, then it is not used unless explicitly asked for by the user at runtime.

This commit was SVN r17516.
2008-02-19 22:15:52 +00:00
Rainer Keller
b22e8e7567 - Need stdbool.h if included in userland
This commit was SVN r17504.
2008-02-19 00:39:48 +00:00
Rainer Keller
d53131f261 - Need stdbool.h if included in userland; additionally protect stdbool / stdarg.h
This commit was SVN r17488.
2008-02-18 08:11:57 +00:00
Aurelien Bouteiller
e7aaf6aa67 Patch to introduce PRI printf constants on architecture that do not provide C99 inttypes.h. Mainly usefull on windows, but might also prove helpful to deal with all the size_t and other size changing datatypes that used to be casted long long in printf/opal_output to avoid warnings.
This commit was SVN r17451.
2008-02-14 03:31:49 +00:00
Josh Hursey
95c31388e1 It was observed that the component constraint logic is currently only used
by the checkpoint/restart feature. Other constraints could be enforced here,
but at the moment it is only the checkpointable constraint.

So this commit just removes this logic from non-c/r builds. If someone 
wanted to add a new constraint in the future then there is a comment in
the code that directs them a bit.

This commit was SVN r17447.
2008-02-13 19:26:25 +00:00
Sharon Melamed
5b2dab2439 Reverted commit # r17443
This commit was SVN r17446.

The following SVN revision numbers were found above:
  r17443 --> open-mpi/ompi@88ce5a2b73
2008-02-13 14:07:12 +00:00
Sharon Melamed
88ce5a2b73 Replaced PLPA to the latest PLPA (plpa-1.1a3r123)
This commit was SVN r17443.
2008-02-13 13:09:11 +00:00
Rainer Keller
9cd2c6f48b - Instead of calling RUNNING_ON_VALGRIND,
implement specific function, thereby
   removing bogus requirement on valgrind/valgrind.h
   dough...
 - Call specific function runindebugger() before
   doing expensive checks on each component of struct.
 - Get rid of void* warnings..

This commit was SVN r17438.
2008-02-12 20:37:51 +00:00
Rainer Keller
7621800477 - Fix and add comments -- output full name for pd
- Protect argument in macro...

This commit was SVN r17434.
2008-02-12 16:59:59 +00:00
Rainer Keller
b20f434306 - really minor fix in comment.
This commit was SVN r17433.
2008-02-12 16:54:27 +00:00
Shiqing Fan
f5792bbda5 merging the memchecker into trunk.
This commit was SVN r17424.
2008-02-12 08:46:27 +00:00
Sharon Melamed
51f8308c68 Added Bi-Directional connection in the carto file.
This commit was SVN r17393.
2008-02-07 09:51:19 +00:00
Sharon Melamed
c9f80caf7c fixed a printing bug in case the carto file is not found.
This commit was SVN r17392.
2008-02-07 09:02:23 +00:00
Sharon Melamed
98e8de264d Wraped the carto API in carto_base_wrapers.c
This commit was SVN r17380.
2008-02-05 19:29:16 +00:00
Sharon Melamed
9ef46de2f5 added proper wraping to the paffinity new APIs
This commit was SVN r17379.
2008-02-05 17:37:17 +00:00
Pak Lui
6900fe36c2 Restore the solaris paffinity with an older but working implementation with processor_bind() instead of the pset_*() implementation that is commented out. There's also a fix for allowing some Sun platforms which have non-contiguous CPU IDs
to do processor binding.

This commit was SVN r17309.
2008-01-29 16:09:56 +00:00
Ralph Castain
71378305ed The static-components.h file should never be under svn control - it is dynamically generated during build. Update properties to ignore that file.
Update properties to ignore the carto_file_lex.c file since that is also dynamically generated.

Update the build-hgignore.pl to properly disregard DS_Store files

This commit was SVN r17301.
2008-01-29 14:18:00 +00:00