1
1
Граф коммитов

11204 Коммитов

Автор SHA1 Сообщение Дата
George Bosilca
028c7391d3 Coverty fix: Replace strcpy by strncpy.
This commit was SVN r17961.
2008-03-25 22:39:24 +00:00
Jeff Squyres
183fcdf51b Remove duplicate free(), fixing CID 973.
This commit was SVN r17959.
2008-03-25 20:30:56 +00:00
George Bosilca
6717b2dc75 Add the Solaris evport to the list of available event subsystems.
This commit was SVN r17958.
2008-03-25 18:00:40 +00:00
Jeff Squyres
763218e754 Fix #1253: default libevent to use select/poll and only use the other
mechanisms (such as epoll) if someone (ompi_mpi_init()) requests
otherwise.  See big comment in opal/event/event.c for a full
explanation.

This commit was SVN r17956.
2008-03-25 17:18:17 +00:00
George Bosilca
03c10e2a85 Add the Solaris evport support.
This commit was SVN r17954.
2008-03-25 16:44:27 +00:00
Ralph Castain
d6eeb67086 Update the RR platform files
This commit was SVN r17953.
2008-03-25 15:04:04 +00:00
Ralph Castain
90107f3c14 Fix an issue with comm_spawn over who sent/recv first in the modex. The modex assumes that the first name on the list is the "root" that will serve as the allgather collector/distributor. The dpm was putting that entity last, which forced us to pre-inform the parent procs of the child proc's contact info since the parent was trying to send to the child.
Clarify the setting of send_first in the mpi bindings (trivial, i know, but helpful)

Remove the extra xcast of child contact info to the parent job.

This commit was SVN r17952.
2008-03-25 14:57:34 +00:00
Galen Shipman
f1e3045296 need orted_LDFLAGS as a placeholder
you will need to re autogen.sh 

This commit was SVN r17951.
2008-03-25 13:41:09 +00:00
Ralph Castain
cca449e379 Move an OMPI RML tag to the OMPI layer
This commit was SVN r17950.
2008-03-25 13:30:48 +00:00
Jeff Squyres
f97130c3aa Sanity check: don't allow --disable-binaries with --enable-dist.
This commit was SVN r17946.
2008-03-25 12:01:36 +00:00
Jeff Squyres
5320c91ab3 Oops -- fix the constructor to also use opal_object_t instead of
opal_list_item_t.

This commit was SVN r17945.
2008-03-25 11:59:50 +00:00
Ralph Castain
ec76fe4fe4 Fix singletons - only go through orte_proc_info_init once! Session dir is calling it be sure it is filled in, which was causing us to reset the fields.
This commit was SVN r17944.
2008-03-25 02:10:42 +00:00
Galen Shipman
0116041133 BTL shouldn't own the passive side's descriptor in the PML get protocol. The BTL
doesn't know when to free it on the passive side. 

This commit was SVN r17943.
2008-03-25 01:43:41 +00:00
Ralph Castain
ebea4d04e4 Remove defunct error constant - we no longer have a GPR that can hold corrupt data!
This commit was SVN r17942.
2008-03-24 21:05:14 +00:00
Ralph Castain
4efddc7b0a Fix the allgather and allgather_list functions to avoid deadlocks at large node/proc counts. Violated the RML rules here - we received the allgather buffer and then did an xcast, which causes a send to go out, and is then subsequently received by the sender. This fix breaks that pattern by forcing the recv to complete outside of the function itself - thus, the allgather and allgather_list always complete their recvs before returning or sending.
Reogranize the grpcomm code a little to provide support for soon-to-come new grpcomm components. The revised organization puts what will be common code elements in the base to avoid duplication, while allowing components that don't need those functions to ignore them.

This commit was SVN r17941.
2008-03-24 20:50:31 +00:00
Jeff Squyres
ebfdd133f5 AFACT, we never put endpoints on a list.
This commit was SVN r17940.
2008-03-24 18:32:55 +00:00
Ralph Castain
58d51f2689 Revert that! Need to complete the rest of the change so the orted knows the correct nodeid...
Sorry

This commit was SVN r17939.
2008-03-24 18:17:26 +00:00
Ralph Castain
dae4518878 Use the correct nodeid!
This commit was SVN r17938.
2008-03-24 18:15:08 +00:00
Jeff Squyres
004c3a5b09 Ensure to cover all cases when either ORTE or OMPI is not yet
initialized.  For example, there is a period of time during
ompi_mpi_init when orte_initialized==true, but
ompi_mpi_initialized==false (and therefore communicators are not setup
yet, etc.).

This commit was SVN r17937.
2008-03-24 16:25:14 +00:00
Ralph Castain
dc7f45dafd Remove the obsolete and largely unused orte_system_info structure. The only fields that were used in that struct were nodeid and nodename - these have been transferred to the orte_process_info structure.
Only one place used the user name field - session_dir, when formulating the name of the top-level directory. Accordingly, the code for getting the user's id has been moved to the session_dir code.

This commit was SVN r17926.
2008-03-23 23:10:15 +00:00
George Bosilca
9222ea0d0a Cast the uintptr_t to int when playing with fds.
This commit was SVN r17925.
2008-03-23 18:16:29 +00:00
Jeff Squyres
8239e40607 Add header for OS X.
This commit was SVN r17924.
2008-03-23 12:57:57 +00:00
Jeff Squyres
314ab2c6e7 Update internal libevent to upstream (v1.4.2-rc + OMPI changes).
Greatly reduce the number of "foo" -> "opal_foo" symbol renames in the
libevent source, and instead greatly expand the event_rename.h file
that uses preprocessor macros to make all public symbols be
"opal_foo".

This commit was SVN r17923.
2008-03-23 12:33:04 +00:00
Jeff Squyres
dee561d29e Per recent off-list discussions about the build system, I have done
some cleanups and standardizations in the various */tools/*/ 
Makefile.am files.  This commit:

 * Somewhat simplify the tool Makefile.am's 
 * Makes the tool Makefile.am's consistent with each other (do similar
   actions in similar ways)
 * Update the tool Makefile.am's to remove old kruft that was required
   by older versions of AM (trunk requires AM >=1.10)

This commit was SVN r17921.
2008-03-22 02:04:05 +00:00
Jeff Squyres
ace1717ca7 Patch from Brian to add in proper linker libraries
This commit was SVN r17919.
2008-03-21 23:00:54 +00:00
Jeff Squyres
05a7b1ed55 Remove svn:executable from these files.
This commit was SVN r17918.
2008-03-21 21:16:11 +00:00
Brian Barrett
f176c67cd2 Set the nodeid to something somewhat sane if we're not using modex, and
don't set the LOCAL flag just because both procs have an invalid nodeid.

This commit was SVN r17917.
2008-03-21 20:20:00 +00:00
Brian Barrett
5a7ebf5f25 Do not try to update the local process with modex information (from the local
process) as it stomps on information if the modex doesn't exist for the
current platform

This commit was SVN r17916.
2008-03-21 19:20:47 +00:00
Jeff Squyres
a4ec8a9d53 Spring cleaning -- no one is using this stuff; remove it from the tree.
This commit was SVN r17913.
2008-03-21 17:14:42 +00:00
Ralph Castain
f8642e9390 Add debug to tell us when we opened a socket and to whom
This commit was SVN r17911.
2008-03-21 15:47:47 +00:00
Ralph Castain
19ffdfef42 Add some debugging output to tell us what interfaces were considered and used by OOB
This commit was SVN r17909.
2008-03-21 15:35:40 +00:00
Jeff Squyres
4fbcb75ce8 With 5 commits over a 16 hour period and 3 broken tarball builds and a
still-broken trunk build on common platforms (e.g., 64 bit Linux
RHEL4U4), I think it's clear that this code is not ready for
prime-time.

I'm backing out all the commits in the trunk/ompi/op tree from r17901
onwards.  This code can be re-committed when compiles and runs on
common platforms.

cd ompi/op
svn merge -r 17907:17900 https://svn.open-mpi.org/svn/ompi/trunk/ompi/op .

This commit was SVN r17908.

The following SVN revision numbers were found above:
  r17901 --> open-mpi/ompi@b9520e61dc
2008-03-21 14:47:01 +00:00
Jeff Squyres
8284f64af1 With r17906, this commit should make the trunk compile again.
This commit was SVN r17907.

The following SVN revision numbers were found above:
  r17906 --> open-mpi/ompi@df4a6c3fc5
2008-03-21 13:49:23 +00:00
Rich Graham
df4a6c3fc5 fix function prototypes for new 3 buffer routines.
This commit was SVN r17906.
2008-03-21 13:44:15 +00:00
Ralph Castain
b2655ab585 Per Brian's suggestion, remove unnecessary library dependency - libtool automagically picks up the other libraries when we include libmpi
This commit was SVN r17905.
2008-03-21 12:47:04 +00:00
Rich Graham
0974160e29 correct several of the new macros.
This commit was SVN r17904.
2008-03-21 03:45:43 +00:00
Rich Graham
a7c836a2b0 fix location of the restrict key word.
Make the tag in the fan-in/fan-out algorithm be fragment based.

This commit was SVN r17903.
2008-03-21 01:40:36 +00:00
Rich Graham
2c66d396b7 take care of some bit-rot with the fanin-fanout method.
This commit was SVN r17902.
2008-03-21 01:08:49 +00:00
Rich Graham
b9520e61dc get the sm optimized allreduce working for all but user defined
operations.  Added to the reduction operations a set of reduction
functions that take 2 input buffers and one output buffer to avoid
some extra memory copies.  These can't be used with user defined
operations.  The intel c collective suite passes both original, and
new (new, not the user defined operations).

This commit was SVN r17901.
2008-03-20 23:51:16 +00:00
Galen Shipman
dcac824f59 Fix problem in releasing fragments during GET_END event (didn't check that
portals btl has ownership and therefor didn't free the frag as it should) this
causes leakage and hangs in MPI_Finalize. 

Also added a bit more debugging. 

This commit was SVN r17900.
2008-03-20 22:46:32 +00:00
Ralph Castain
c2fd5dd416 Clarify method used to translate application proc termination codes to exit status codes
This commit was SVN r17899.
2008-03-20 18:50:05 +00:00
Brian Barrett
2bf4784893 Set a meaningful orte_system_info.nodeid on Catamount
This commit was SVN r17898.
2008-03-20 16:55:57 +00:00
Ralph Castain
f8a10dfb93 Complete the fix of the orted vs mpirun race condition for finalizing. The darned mpirun is just too fast! Rather than try to slow it down, we set the orte_finalizing flag -prior- to telling mpirun the orted is leaving. This ensures we don't mistakenly declare the lifeline lost when mpirun leaves in a hurry.
This commit was SVN r17897.
2008-03-20 16:55:24 +00:00
Ralph Castain
6bb139e4f2 One more correction to mpirun exit codes - cleanup the application proc's exit codes in the orted so that non-zero exit codes generated by mpirun itself don't get "munged".
Modify the multi_abort function so they all return different exit codes - allows us to tell which one was being reported.

This commit was SVN r17895.
2008-03-20 13:54:11 +00:00
Ralph Castain
27a73ad9ee Fix a race condition between the orteds and HNP that can cause the orteds to output the "lost lifeline" message.
This has been a long-time problem. I tried to reduce the problem by having the orteds tell the HNP they were finalizing, and having the HNP wait until all orteds had reported or we timed out.

What was observed was that all the orteds were correctly reporting that they are leaving, but the HNP is able to exit before the orteds, thus closing the orteds lifeline socket and generating the error output. This is caused by the fact that the orteds have to whack all remaining session directories, which includes that blasted monster shared memory file! Cleaning up the SM file can take quite a while.

The HNP doesn't have that problem as there is no SM file there! So it gets out first.

What we had done in the past to resolve that problem was put a little test in the OOB that checks to see if we are finalizing. If we are, then we ignore the lifeline connection being lost. That check was still in the code - however, we had lost the line in orte_finalize that set the flag!!

This commit was SVN r17893.
2008-03-20 13:30:51 +00:00
Ralph Castain
8ee26a55ca Just turn these off for now - will revisit later
This commit was SVN r17891.
2008-03-20 13:25:35 +00:00
Jeff Squyres
e0fb3957cb Patch from Brian:
* The opal_sys_timer_get_cycles() call was implemented for
   Sparc v9 using inline assembly, but not in the assembly files.
   This would only currently matter on Linux Sparc systems using
   a compiler that didn't support inline assembly (not many of
   those), but it should be there for completion.
 * The linux timer component would always build on non-Alpha
   platforms, rather than only building on platforms where
   opal_sys_timer_get_cycles() was implemented.  This would
   only matter on a very narrow set of platforms that we don't
   really support, but still, it could be more right.  We now
   only build the component on platforms where we have the
   assembly call to get the cycle counter.
 * Added a comment to opal/sys/timer.h to note that the linux
   timer component needed to be updated if another platform was
   added.

This should be harmless to commit.  It will only really change
behaviors on platforms we don't have assembly support for, which
currently won't make it through configure.  It really only matters
when (if?) we support atomic operations through libatomic_ops.

This commit was SVN r17887.
2008-03-20 00:29:36 +00:00
Ralph Castain
67a2cc8a8e Fix a bug noted by Tim P where we would report the incorrect app_context as "not found". If you gave us the command line:
mpirun -n 1 hostname : -n 1 bogus

we would erroneously report that hostname had not been found instead of bogus.

This commit was SVN r17886.
2008-03-19 21:13:13 +00:00
Tim Mattox
715b05d663 Update the NEWS for 1.2.6.
This commit was SVN r17885.
2008-03-19 21:04:54 +00:00
Ralph Castain
ec64bf3da8 Clarify the error output so we can understand if it was a daemon or process that lost its lifeline
This commit was SVN r17880.
2008-03-19 19:06:52 +00:00