1
1
Граф коммитов

941 Коммитов

Автор SHA1 Сообщение Дата
George Bosilca
f5dfc005a4 Only check for /proc/cpuinfo if we are on a supported architecture.
This commit was SVN r18331.
2008-04-29 22:36:18 +00:00
George Bosilca
465f690f90 We need to force the compiler to preprocess these files as some of
them use #include. The standard way is to rename to file .S instead
of .s.

This commit was SVN r18290.
2008-04-24 21:40:40 +00:00
Josh Hursey
2c736873bb Fix a checkpoint/restart bug that causes a restarted application to occasionally throw a SIGSEGV or SIGPIPE due to invalid socket descriptors.
The problem was caused by a bad ordering between the restart of the ORTE level tcp connections (in the OOB - out-of-band communication) and the Open MPI level tcp connections (BTLs). Before this commit ORTE would shutdown and restart the OOB completely before the OMPI level restarted its tcp connections. What would happen is that a socket descriptor used by the OMPI level on checkpoint was assigned to the ORTE level on restart. But the OMPI level had no knowledge that the socket descriptor it was previously using has been recycled so it closed it on restart. This caused the ORTE level to break as the newly created socket descriptor was closed without its knowledge.

The fix is to have the OMPI level shutdown tcp connections, allow the ORTE level to restart, and then allow the OMPi level to restart its connections. This seems obvious, and I'm surprised that this bug has not cropped up sooner. I'm confident that this specific problem has been fixed with this commit.

Thanks to Eric Roman and Tamer El Sayed for their help in identifying this problem, and patience while I was fixing it.

 * Add a new state {{{OPAL_CRS_RESTART_PRE}}}. This state identifies when we are on the down slope of the INC (finalize-like) which is useful when you want to close, but not reopen a component set for fear of interfering with a lower level.
 * Use this new state in OMPI level coordination. Here we want to make sure to play well with both the OMPI/BTL/TCP and ORTE/OOB/TCP components.
 * Update ft_event functions in PML and BML to handle the new restart state.
 * Add an additional flag to the error output in OOB/TCP so we can see what the socket descriptor was on failure as this can be helpful in debugging.

This commit was SVN r18276.
2008-04-24 17:54:22 +00:00
Shiqing Fan
4a9787979e When valgrind is not available or it is deselected (--without-valgrind, --with-valgrind=no), don't compile this component, continue without abortion.
This commit was SVN r18243.
2008-04-23 11:50:42 +00:00
Josh Hursey
cc83d41ad9 Merge in tmp/jjh-scratch
{{{
 svn merge -r 18218:18240 https://svn.open-mpi.org/svn/ompi/tmp/jjh-scratch .
}}}

Contains:
 * Primarily a fix for a user reported problem where a cached file descriptor is causing a SIGPIPE on restart.
 * Cleanup some small memory leaks from using mca_base_param_env_var() - Thanks Jeff
 * Cleanup ORTE FT tool compilation in non-FT builds - Thanks Tim P.
 * Cleanup mpi interface with missplaced {{{OPAL_CR_ENTER_LIBRARY}}} - Thanks Terry
 * Some other sundry cleanup items all dealing with C/R functionality in the trunk.

This commit was SVN r18241.
2008-04-23 00:17:12 +00:00
Jeff Squyres
db2695ccab Make the symbols be visible.
This commit was SVN r18201.
2008-04-18 00:26:17 +00:00
Ralph Castain
fa082cafa9 Shift the architecture calculation from the ompi/datatype engine to the opal/util area. This allows us to compute the architecture earlier in the launch and communicate it outside of the modex.
Note: this is an early preliminary step in the movement of portions of the datatype engine to the opal layer.

This commit was SVN r18198.
2008-04-17 20:43:56 +00:00
George Bosilca
01148b77dc Generate the help message for the available event ops. Now the list only
contains the one that are compiled on the current ompi.

This commit was SVN r18196.
2008-04-17 18:16:54 +00:00
Ralph Castain
e7487ad533 Implement the seq rmaps module that sequentially maps process ranks to a list hosts in a hostfile.
Restore the "do-not-launch" functionality so users can test a mapping without launching it.

Add a "do-not-resolve" cmd line flag to mpirun so the opal/util/if.c code does not attempt to resolve network addresses, thus enabling a user to test a hostfile mapping without hanging on network resolve requests.

Add a function to hostfile to generate an ordered list of host names from a hostfile

This commit was SVN r18190.
2008-04-17 13:50:59 +00:00
Shiqing Fan
49fbc4e795 These functions should always have a return value.
This commit was SVN r18174.
2008-04-16 13:54:15 +00:00
George Bosilca
b359d84661 Use the correct prefix.
This commit was SVN r18048.
2008-03-31 21:42:59 +00:00
George Bosilca
be2454e0c5 Default the temporary directory to /tmp if no special environment
variables are set.

This commit was SVN r18046.
2008-03-31 20:15:49 +00:00
George Bosilca
ee784b601e For consistency reasons always use opal_home_directory and
opal_tmp_directory.

This commit was SVN r18043.
2008-03-31 18:13:41 +00:00
George Bosilca
60111ce66d Few less warnings.
This commit was SVN r18025.
2008-03-30 19:06:49 +00:00
Lenny Verkhovsky
fa6a084d33 added opal/mca/paffinity/base/paffinity_base_service.c with paffinity functions
This commit was SVN r18020.
2008-03-30 12:01:02 +00:00
Lenny Verkhovsky
7e45d7e134 Few updates due to RMAPS rank_file component changes
1. applied prefix rule to functions and variables of RMAPS rank_file component
2. cleaned ompi_mpi_init.c from paffinity code
3. paffinity code moved to new opal/mca/paffinity/base/paffinity_base_service.c file
4. added opal_paffinity_slot_list mca parameter

This commit was SVN r18019.
2008-03-30 11:52:11 +00:00
Shiqing Fan
f82092566f We don't have inttypes.h on Windows, and some types are redefined.
This commit was SVN r18010.
2008-03-28 17:33:54 +00:00
Shiqing Fan
aaf2730fab Winsock2.h also has definition for timeval and so on, it conflicts with our own definitions.
This commit was SVN r18009.
2008-03-28 17:30:33 +00:00
Jeff Squyres
6ea36061cf Fix typo found by Pak.
This commit was SVN r18000.
2008-03-27 23:04:17 +00:00
Jeff Squyres
c06f7c3992 Fixes trac:1254: ensure that evport.c is in the distribution tarball.
This commit was SVN r17989.

The following Trac tickets were found above:
  Ticket 1254 --> https://svn.open-mpi.org/trac/ompi/ticket/1254
2008-03-27 16:40:55 +00:00
Sharon Melamed
afa98f92e8 Changed the for loop to a while loop so I could
release the edge without conflicting with get next.

This commit was SVN r17979.
2008-03-26 14:45:45 +00:00
Jeff Squyres
33c09b30c2 Patch from George: ensure that we don't overwrite timer_linux_happy
improperly when checking the host type.

This commit was SVN r17975.
2008-03-26 11:22:57 +00:00
George Bosilca
4a5431ef11 Remove the event-config.h file, it is never used.
Correct the include logic that protect the headers. It's amazing
that this didn't bite us yet ...

This commit was SVN r17971.
2008-03-26 03:33:43 +00:00
George Bosilca
64bc580c78 Use evutil_timercmp instead of timercmp to take advantage of the
fallback installed in evutil.h.

This commit was SVN r17968.
2008-03-25 23:54:30 +00:00
George Bosilca
2e46a53b0a Avoid strcpy if its not really required.
This commit was SVN r17962.
2008-03-25 22:40:20 +00:00
George Bosilca
028c7391d3 Coverty fix: Replace strcpy by strncpy.
This commit was SVN r17961.
2008-03-25 22:39:24 +00:00
George Bosilca
6717b2dc75 Add the Solaris evport to the list of available event subsystems.
This commit was SVN r17958.
2008-03-25 18:00:40 +00:00
Jeff Squyres
763218e754 Fix #1253: default libevent to use select/poll and only use the other
mechanisms (such as epoll) if someone (ompi_mpi_init()) requests
otherwise.  See big comment in opal/event/event.c for a full
explanation.

This commit was SVN r17956.
2008-03-25 17:18:17 +00:00
George Bosilca
03c10e2a85 Add the Solaris evport support.
This commit was SVN r17954.
2008-03-25 16:44:27 +00:00
George Bosilca
9222ea0d0a Cast the uintptr_t to int when playing with fds.
This commit was SVN r17925.
2008-03-23 18:16:29 +00:00
Jeff Squyres
8239e40607 Add header for OS X.
This commit was SVN r17924.
2008-03-23 12:57:57 +00:00
Jeff Squyres
314ab2c6e7 Update internal libevent to upstream (v1.4.2-rc + OMPI changes).
Greatly reduce the number of "foo" -> "opal_foo" symbol renames in the
libevent source, and instead greatly expand the event_rename.h file
that uses preprocessor macros to make all public symbols be
"opal_foo".

This commit was SVN r17923.
2008-03-23 12:33:04 +00:00
Jeff Squyres
dee561d29e Per recent off-list discussions about the build system, I have done
some cleanups and standardizations in the various */tools/*/ 
Makefile.am files.  This commit:

 * Somewhat simplify the tool Makefile.am's 
 * Makes the tool Makefile.am's consistent with each other (do similar
   actions in similar ways)
 * Update the tool Makefile.am's to remove old kruft that was required
   by older versions of AM (trunk requires AM >=1.10)

This commit was SVN r17921.
2008-03-22 02:04:05 +00:00
Jeff Squyres
05a7b1ed55 Remove svn:executable from these files.
This commit was SVN r17918.
2008-03-21 21:16:11 +00:00
Jeff Squyres
a4ec8a9d53 Spring cleaning -- no one is using this stuff; remove it from the tree.
This commit was SVN r17913.
2008-03-21 17:14:42 +00:00
Jeff Squyres
e0fb3957cb Patch from Brian:
* The opal_sys_timer_get_cycles() call was implemented for
   Sparc v9 using inline assembly, but not in the assembly files.
   This would only currently matter on Linux Sparc systems using
   a compiler that didn't support inline assembly (not many of
   those), but it should be there for completion.
 * The linux timer component would always build on non-Alpha
   platforms, rather than only building on platforms where
   opal_sys_timer_get_cycles() was implemented.  This would
   only matter on a very narrow set of platforms that we don't
   really support, but still, it could be more right.  We now
   only build the component on platforms where we have the
   assembly call to get the cycle counter.
 * Added a comment to opal/sys/timer.h to note that the linux
   timer component needed to be updated if another platform was
   added.

This should be harmless to commit.  It will only really change
behaviors on platforms we don't have assembly support for, which
currently won't make it through configure.  It really only matters
when (if?) we support atomic operations through libatomic_ops.

This commit was SVN r17887.
2008-03-20 00:29:36 +00:00
George Bosilca
3997639ec6 Hide what should be hidden, and expose the others. Plus some indentation.
This commit was SVN r17856.
2008-03-18 03:00:08 +00:00
Jeff Squyres
f443644bfe From Brian B.:
This commit lowers the priority of the darwin backtrace component
below that of the ''execinfo'' and ''stackprint'' components, which
will cause OS X Leopard to use the ''execinfo'' component.  execinfo
utilizes a public API for printing the stacktrace.  The ''darwin''
component uses some evil hacks and a not-so supported package from
Apple to print the stack trace.  

This commit was SVN r17840.
2008-03-17 13:39:25 +00:00
Jeff Squyres
9b18b0e9c6 Fix visibility symbols on OS X
This commit was SVN r17838.
2008-03-17 13:18:12 +00:00
George Bosilca
210631962c Add two convenience functions in order to make sure we get these
environment variables in a consistent manner. These functions
retrieve the user and the temporary directories (based on the
system).

This commit was SVN r17815.
2008-03-13 17:56:44 +00:00
Jon Mason
2e8a316ae6 opal_evtimer_initialized is missing the opening '('
This commit was SVN r17814.
2008-03-12 20:33:22 +00:00
Sharon Melamed
4a8e2a2648 Renove status check from carto initiation.
This commit was SVN r17812.
2008-03-12 08:55:28 +00:00
George Bosilca
4267f2b967 This symbol have to be visible.
This commit was SVN r17793.
2008-03-08 23:53:17 +00:00
Rainer Keller
32dcd9e551 - Adding #include <stdbool.h> with protection in r17488 and r17504
seemed to be the right thing(tm), but broke the Sun Studio C++
   compiler under Linux (ticket 747).

   This patch should allow inclusion into C and C++ from other header
   files without problems.

This commit was SVN r17792.

The following SVN revision numbers were found above:
  r17488 --> open-mpi/ompi@d53131f261
  r17504 --> open-mpi/ompi@b22e8e7567
2008-03-08 12:53:10 +00:00
Josh Hursey
aaff245271 A couple verbose additions. Poll the event engine while waiting for the
named pipe.

This commit was SVN r17787.
2008-03-07 21:10:14 +00:00
Jeff Squyres
b2ed2b95aa Fix filename so that the help file can be found.
This commit was SVN r17759.
2008-03-06 14:44:47 +00:00
Rolf vandeVaart
91af56db00 Fix a few typos so this compiles on Solaris. Remove some trailing spaces.
This commit was SVN r17746.
2008-03-05 20:16:00 +00:00
Aurelien Bouteiller
c280b81e40 Revert the last patch. Still some warning should be issued on ia32 architectures. Looking for a fix.
This commit was SVN r17745.
2008-03-05 17:20:11 +00:00
Josh Hursey
612ebdc2ac Cleanup some symbol visability issues.
This commit was SVN r17733.
2008-03-05 13:59:25 +00:00
Josh Hursey
3b4073e32c This commit fixes the checkpoint/restart functionality on the trunk. Included in this commit are:
* Extension to the ESS framework to support C/R
 * Fixed support for {{{snapc_base_establish_global_snapshot_dir}}}
 * Fixed FileM support
 * Misc. minor code modifications

There are some outstanding visability issues that I want to fix next.

This commit was SVN r17725.
2008-03-05 04:57:23 +00:00