1
1
Граф коммитов

14985 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
67027cfce4 Add a new "generic" ess module that supports direct launched MPI jobs so long as certain minimum envars are provided, regardless of launch environment.
This commit was SVN r23478.
2010-07-22 21:51:43 +00:00
Ralph Castain
e7719f0aa4 Update platform files, adjust sensor heartbeat module selection rules
This commit was SVN r23477.
2010-07-22 21:50:46 +00:00
Jeff Squyres
7d7c0aa48f Somehow the check for the specific value "external" got dropped in the
logic (even though the "else" clause for handling it was there).  This
commit puts back the specific check for the word "external".

Thanks to Jed Brown for noticing the issue.  Fixes trac:2503.

This commit was SVN r23475.

The following Trac tickets were found above:
  Ticket 2503 --> https://svn.open-mpi.org/trac/ompi/ticket/2503
2010-07-22 11:42:15 +00:00
Jeff Squyres
cb9f53cd50 Add some recent fixes:
* MPI_GET_COUNT
 * One-sided operations with large target displacements

This commit was SVN r23473.
2010-07-22 02:37:47 +00:00
Jeff Squyres
51a051b072 This commit, along with r23467, r23468, r23470, r23471 should fix #2241.
This commit:

 * Adds the configury to figure out how many Fortran INTEGERs are 
   necessary to represent the C MPI_Status (which now includes a size_t
   member).
 * Sets MPI_STATUS_SIZE to this value in mpif-config.h.in.
 * Adds a big comment in status_c2f.c explaining why the no changes 
   were necessary to how we copy statuses between Fortran and C.

This commit was SVN r23472.

The following SVN revision numbers were found above:
  r23467 --> open-mpi/ompi@733d25a8a3
  r23468 --> open-mpi/ompi@963fcb13a5
  r23470 --> open-mpi/ompi@418b989781
  r23471 --> open-mpi/ompi@bc74a446ac
2010-07-22 02:23:47 +00:00
Jeff Squyres
bc74a446ac Add some comments to reinforce the fact that MPI applications should not be using the non-public members of ompi_status_public_t. Refs trac:2241.
This commit was SVN r23471.

The following Trac tickets were found above:
  Ticket 2241 --> https://svn.open-mpi.org/trac/ompi/ticket/2241
2010-07-22 01:59:33 +00:00
Jeff Squyres
418b989781 Divide by size, not status->_count. Gives a much better answer. :-)
This commit was SVN r23470.
2010-07-22 01:53:01 +00:00
Jeff Squyres
62fe827bdf s/MPI:Exception/MPI::Exception/g. Think of all the poor users who,
for years, were probably tremendously confused by this typo -- trying
to code their applications by catching MPI:Exception instances, but
failing to compile them.  "Why, cruel world, why?!"

Now we have fixed the error; all is right with the world again.

This commit was SVN r23469.
2010-07-22 01:24:12 +00:00
Jeff Squyres
963fcb13a5 If the value to be returned is larger than what can be represented in
the count parameter, then invoke MPI_ERR_TRUNCATE.

This commit was SVN r23468.
2010-07-22 01:15:46 +00:00
George Bosilca
733d25a8a3 First step toward fixing the MPI_Get_count issues from the ticket #2241. Next
step is the configure and Fortran mojo that Jeff will put in. Until then I
guess the Fortran interface is broken (at least all functions using the hidden
count firld in the MPI_Status).

This commit was SVN r23467.
2010-07-21 20:07:00 +00:00
George Bosilca
3d3677fa7d Update the suppression rules for valgrind to hide the uninitialized byte
in the TCP BTL header.

This commit was SVN r23466.
2010-07-21 17:30:13 +00:00
Jeff Squyres
2af3e6e5ae Minor updates:
* ompi_errcode_get_mpi_code() already checks for >0 error codes; the
   checks in OMPI_ERRHANDLER_INVOKE, OMPI_ERRHANDLER_CHECK, and
   OMPI_ERRHANDLER_RETURN were superfluous.
 * Ensure to use/return an OPAL_SOS-decoded value in
   ompi_errcode_get_mpi_code().  
 * Symbols beginning with !__ technically belong in the compiler
   namespace; we shouldn't be using those.  
 * Other minor style updates in ompi_errcode_get_mpi_code().

This commit was SVN r23463.
2010-07-21 16:27:08 +00:00
Rolf vandeVaart
45019a3abf Correctly handle zero-length match fragment.
This commit was SVN r23459.
2010-07-21 15:27:06 +00:00
Jeff Squyres
5027915ead Hostname is not used in this function.
This commit was SVN r23454.
2010-07-21 11:07:28 +00:00
Jeff Squyres
29c1ad4196 Forgot BEGIN/END C_DECLS.
This commit was SVN r23453.
2010-07-21 11:05:08 +00:00
Jeff Squyres
b3952e4f07 Use const for the opal_fd_write() function, just to be nice.
This commit was SVN r23452.
2010-07-21 11:01:16 +00:00
Jeff Squyres
3031b59cfe Change to use the new opal_fd_*() functions.
This commit was SVN r23451.
2010-07-20 19:54:17 +00:00
Jeff Squyres
ab5fc1b570 Add trivial functions to loop over read()'ing and write()'ing with a
file descriptor (i.e., read and write complete messages, transparently
handling partial reads/writes, EAGAIN, and EINTR).

This code effectively already exists in a few places in the code base;
this is mainly a consolidation.

This commit was SVN r23450.
2010-07-20 19:53:49 +00:00
Jeff Squyres
35690ecad5 Fixes trac:2472. Use large integers to hold displacements for one-sided
operations, not ints. 

Sorry for the mid-day configure.ac change, folks...

This commit was SVN r23449.

The following Trac tickets were found above:
  Ticket 2472 --> https://svn.open-mpi.org/trac/ompi/ticket/2472
2010-07-20 18:45:48 +00:00
Jeff Squyres
64cb8f5d7f Another round of man page cleanups from Debian mantainer Manuel
Prinz.  Many thanks!

This commit was SVN r23445.
2010-07-20 14:07:18 +00:00
Jeff Squyres
e736281adf Add an extra pair of (), just for defensive programming.
This commit was SVN r23444.
2010-07-20 12:23:00 +00:00
Ralph Castain
f3e13b9766 Improve the efficiency by making the check for uniqueness of incoming hnp contact info much faster by including the hnp_uri in the job_family tracker object. Replace the global buffer storage with a quick routine to build the buffer from the jobfams array
This commit was SVN r23443.
2010-07-20 08:30:47 +00:00
Nadia Derbey
837fb29fab Wrong event_type value passed in to show_help when getting xrc async events
This commit was SVN r23442.
2010-07-20 06:37:17 +00:00
Christopher Yeoh
cfea0db3a2 removes spurious compilation warning
This commit was SVN r23441.
2010-07-20 06:32:36 +00:00
Christopher Yeoh
8a3d5d4e1c Adds missing sys/stat.h include needed for more recent versions of glibc
This commit was SVN r23440.
2010-07-20 06:31:16 +00:00
Ralph Castain
f85e69b64b Continue enabling connect_accept across large numbers of independent jobs by replacing the hash tables with pointer_arrays to store routes to remote hnps to gain flexibility we'll need in the future.
This commit was SVN r23439.
2010-07-20 04:47:31 +00:00
Ralph Castain
248320b91a Enable connect_accept between multiple singleton jobs without the presence of an external rendezvous agent (e.g., ompi-server). This also enables connect_accept between processes in more than two jobs regardless of how they were started.
Create an ability to store the contact info for multiple HNPs being used to route between different job families. Modify the dpm orte module to pass the resulting store during the connect_accept procedure so that all jobs involved in the resulting communicator know how to route OOB messages between them.

Add a test provided by Philippe that tests this ability.

This commit was SVN r23438.
2010-07-20 04:22:45 +00:00
Ralph Castain
d9f7947a42 Arg - the other half of the prior commit: have the orted properly parse the resulting data to hand it to orte_odls.
Also, reindent per emacs (sigh).

This commit was SVN r23437.
2010-07-20 04:06:46 +00:00
Ralph Castain
72525a5850 Allow the passing of NULL to orte_plm.kill_local_procs to match what we allow for the equivalent orte_odls call
This commit was SVN r23436.
2010-07-20 04:05:41 +00:00
George Bosilca
519bbf6b6b Remove my patch (r23238) and push Scott Atchley patch. Thanks Scott.
This commit was SVN r23435.

The following SVN revision numbers were found above:
  r23238 --> open-mpi/ompi@c8ee150c95
2010-07-19 20:46:12 +00:00
Jeff Squyres
a8f69c9e3b This is no longer necessary; the orte DPM does the necessary
opal_progress_increment() and opal_progress_decrement().

This commit was SVN r23434.
2010-07-19 19:34:10 +00:00
Jeff Squyres
5ab634555a Apparently, Cisco plans to be working on Open MPI for a veeeeery long time!
This commit was SVN r23433.
2010-07-19 19:31:59 +00:00
Ralph Castain
ad5eaee4c6 Protect against NULL and provide additional resource check/error report
This commit was SVN r23432.
2010-07-19 18:33:32 +00:00
Ralph Castain
0355da6335 Cleanup pointer array addressing and protect against NULL
This commit was SVN r23431.
2010-07-19 18:30:04 +00:00
Ralph Castain
3f7f8df40f Fix singleton comm_spawn by having the host orted create a map object for the singleton job. Even though this isn't filled in, the pidmap function will ignore any job that doesn't have a map object. Thus, this ensures that the singleton's location is included in the pidmap, thereby allowing messages to be routed to/from it.
This commit was SVN r23430.
2010-07-18 02:48:17 +00:00
Ralph Castain
12cd07c9a9 Start reducing our dependency on the event library by removing at least one instance where we use it to redirect the program counter. Rolf reported occasional hangs of mpirun in very specific circumstances after all daemons were done. A review of MTT results indicates this may have been happening more generally in a small fraction of cases.
The problem was tracked to use of the grpcomm.onesided_barrier to control daemon/mpirun termination. This relied on messaging -and- required that the program counter jump from the errmgr back to grpcomm. On rare occasions, this jump did not occur, causing mpirun to hang.

This patch looks more invasive than it is - most of the affected files simply had one or two lines removed. The essence of the change is:

* pulled the job_complete and quit routines out of orterun and orted_main and put them in a common place

* modified the errmgr to directly call the new routines when termination is detected

* removed the grpcomm.onesided_barrier and its associated RML tag

* add a new "num_routes" API to the routed framework that reports back the number of dependent routes. When route_lost is called, the daemon's list of "children" is checked and adjusted if that route went to a "leaf" in the routing tree

* use connection termination between daemons to track rollup of the daemon tree. Daemons and HNP now terminate once num_routes returns zero

Also picked up in this commit is the addition of a new bool flag to the app_context struct, and increasing the job_control field from 8 to 16 bits. Both trivial.

This commit was SVN r23429.
2010-07-17 21:03:27 +00:00
Ralph Castain
acd990ffe5 Add static configuration for IU and clarify its param files. Update cisco platform file
This commit was SVN r23428.
2010-07-17 20:21:23 +00:00
Jeff Squyres
088887d850 Some updates:
* Add notes about libompitrace.
 * Add some more notes about --disable switches.
 * Remove some notes that are no longer necessary. 

This commit was SVN r23427.
2010-07-16 13:20:11 +00:00
Jeff Squyres
c3cc8151c7 Add --disable-<ompi-contrib-name> switch. Thanks to Kevin Buckley for
the initial patch.

This commit was SVN r23426.
2010-07-16 13:19:20 +00:00
Donald Kerr
f79c89e0e9 help maintain order established, and defined, during mca_btl_openib_add_procs()
This commit was SVN r23425.
2010-07-16 13:13:37 +00:00
Rolf vandeVaart
3abb5556a6 Fix bug pointed out by George Bosilca. Also
remove unneeded temp variable.

This commit was SVN r23424.
2010-07-15 19:32:31 +00:00
Shiqing Fan
5789c96525 Add the help and btl ini file into install list.
This commit was SVN r23423.
2010-07-15 18:52:29 +00:00
Ralph Castain
23b166cbd4 Initialize pointers to NULL
This commit was SVN r23422.
2010-07-15 16:33:57 +00:00
Ralph Castain
a99e8cf132 Re-issue the read event so that multiple debugger attachments can occur
This commit was SVN r23421.
2010-07-15 15:35:26 +00:00
Jeff Squyres
57d89d1c0c Remove a lot of kruft from the hwloc paffinity directory that we're
not using in Open MPI (i.e., that stuff is only used in the standalone
builds of hwloc -- it's not compiled/installed/used by Open MPI).

This commit was SVN r23416.
2010-07-14 20:46:47 +00:00
Jeff Squyres
b7a57ffb66 set more ignore on bfo
This commit was SVN r23415.
2010-07-14 15:55:44 +00:00
Shiqing Fan
5184208fca Correct the CMake temporary path.
This commit was SVN r23414.
2010-07-14 13:33:35 +00:00
Shiqing Fan
30c9f9c097 A few more files need to be excluded from windows build source.
This commit was SVN r23413.
2010-07-14 11:25:30 +00:00
Rolf vandeVaart
b7a27ab36a Add support for openib BTL failover to be used with bfo PML.
By default, feature is configured out so no effect on 
normal operation.

This commit was SVN r23412.
2010-07-14 10:08:19 +00:00
Shiqing Fan
5b37e2922c Use semicolon as the separator for Windows, as colon is normally part of the windows path.
This commit was SVN r23411.
2010-07-14 09:12:10 +00:00