1
1

3405 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
3966e30902 Remove every part of MPI-2 one-sided functionality from the tree with
#if OMPI_WANT_MPI2_ONE_SIDED and some automake conditionals.  Also had
to add some AC_SUBSTs to eliminate part of mpif.h (otherwise the
"external" statements would have made undefined symbols).

All the MPI-2 one-sided functionality (including the skeleton
top-level MPI API functions that only invoke an MPI exception) can be
re-enabled with --enable-mpi2-one-sided.

This commit was SVN r3802.
2004-12-14 02:35:03 +00:00
Jeff Squyres
41adbce3b0 Fix compiler warning: remove unused variable
This commit was SVN r3801.
2004-12-14 02:13:29 +00:00
Jeff Squyres
900f3bde32 - Remove some old debugging printf's
- Formalize the Windoze output
- The facilities for lazy opening of files were already included in
  here (yay foresight!); so just don't open the file aggressively when
  we ompi_output_open(); instead, let the first ompi_output() open the
  file.  This allows the session directory to stay empty (and
  therefore removable) if the file is never written to.

This commit was SVN r3800.
2004-12-13 22:39:09 +00:00
Jeff Squyres
49c6784a21 Add a missing OMPI_THREAD_UNLOCK. Doh!
This commit was SVN r3799.
2004-12-13 22:06:40 +00:00
George Bosilca
32285cb1f5 Handle the unexpected messages.
This commit was SVN r3798.
2004-12-13 21:44:02 +00:00
Jeff Squyres
50afa840cc Add simple wrappers around malloc / free so that these functions are
at least usable -- they just don't provide any benefit for message
passing yet.

This commit was SVN r3797.
2004-12-13 20:02:02 +00:00
Brian Barrett
cb89d32009 * Cleanup of name server usage in the pcm (also known as actually using the
name server).
* Add return status message for kill messages from the contact pcm doing 
  the actual killing so that MPI_Abort or ompi_rte_{kill,term}_{proc,job}
  have a useful return value.
* Cleanup the per-started-process local storage in the pcm base to not have
  so much code duplication
* Since there are now 4 kill functions (signal or terminate a proc or job)
  combine them into one function in the pcm interface.  Makes life easier
  all around for PCM authors.  Already had to combine for the message
  transfer.
* Fix race condition in the bootproxy code that was causing it not to
  have the right count for number of alive processes

This commit was SVN r3796.
2004-12-13 15:41:59 +00:00
Brian Barrett
a1aeb8dab3 * Add some checks for bad parameters so we don't have frees of NULL
pointers and the like

This commit was SVN r3795.
2004-12-13 14:11:38 +00:00
George Bosilca
904d075c39 Resolve the fragmentation problem with contiguous data. Now we respect the maximum
amount suggested by the caller, and update the fields correctly.

This commit was SVN r3794.
2004-12-13 05:40:56 +00:00
George Bosilca
0149a8ca06 Just a checkpoint. After a week of GM optimizations there is just one conclusion: too many optimizations
break the code. On the other hand tomorrow I will have 6 hours in the plane ...

This commit was SVN r3793.
2004-12-13 05:38:34 +00:00
Craig E Rasmussen
4857cd492a Fixed type of disp parameter in file_get_byte_offset_f to MPI_Offset.
This commit was SVN r3792.
2004-12-13 03:37:03 +00:00
Craig E Rasmussen
7827177be5 Fixed type of disp parameter in file_get_byte_offset_f to MPI_Offset.
This commit was SVN r3791.
2004-12-13 03:35:27 +00:00
Jeff Squyres
5b2063dc8a Oops -- #undef MPIO_Request, MPIO_Test, and MPIO_Wait when we're
compiling ROMIO itself.

This commit was SVN r3790.
2004-12-13 00:22:00 +00:00
Ralph Castain
2cb5de1a32 Also helps if you actually *send* the synchro notification message instead of just *queueing* it....grrr.
Okay Brian - now it truly is ready to run.

This commit was SVN r3789.
2004-12-12 17:19:31 +00:00
Ralph Castain
367d13fde6 Fix a goof that blocked cleanup of a specified process - it always cleanup up itself instead. Add a debug statement to proxy_cleanup.
Brian - you should be good to roll now. My apologies.

This commit was SVN r3788.
2004-12-12 17:07:20 +00:00
Jeff Squyres
8d82071f3b Oops -- put back Tim's workaround.
This commit was SVN r3787.
2004-12-12 15:31:11 +00:00
Jeff Squyres
ac5f313af8 First cut at non-blocking IO progress. No asychronous progress
support is included because ROMIO is inherently thread-unsafe.

One possible way to have true asynchronous progress would be to use a
progress thread that wakes up and polls at some frequency when there
are non-blocking IO requests pending.  This is pretty icky, though --
it should definitely have an MCA parameter to enable/disable this
functionality, as well as another to control the polling frequency.

This also strengthens the argument that we need a v2 of the io
framework -- one that is not designed to exclusively support ROMIO --
one that does something unimaginably "better" for the parallel MPI-2
IO interface.  :-)

This commit was SVN r3786.
2004-12-12 15:29:29 +00:00
Brian Barrett
d6e0552080 * Make sure all fields are filled in when setting process status
* add debuging code for the callback.  Since gdb doesn't really like
  doing things like waitpid for processes, spin when we are in the
  handler in a way that gdb can easily attach and debug

This commit was SVN r3785.
2004-12-12 03:02:07 +00:00
Rich Graham
fa98bd54c7 terminate the shared memory thread cleanly, at the end of the
job.  In mca_ptl_sm_component_close send termination request.

This commit was SVN r3784.
2004-12-11 23:29:47 +00:00
Jeff Squyres
a120864786 Remove extraneous debugging echo statement
This commit was SVN r3783.
2004-12-11 01:42:56 +00:00
Jeff Squyres
cd0a0a7958 - Fix docs
- Add {} to be safer / conformant

This commit was SVN r3782.
2004-12-11 01:33:05 +00:00
Jeff Squyres
9f6cbfde9b Remove duplicate copyright
This commit was SVN r3781.
2004-12-11 01:32:17 +00:00
Rich Graham
b409cda62e peers[] was being initialized twice - once after it was actually
set.

This commit was SVN r3780.
2004-12-10 22:31:29 +00:00
Ralph Castain
5727a88632 Fix a memory bug in ompid...helps if you load all the new process status fields.
This commit was SVN r3779.
2004-12-10 22:24:24 +00:00
Tim Woodall
8a496b63ee for now - always progress oob otherwise connect/accept/spawn are broken
This commit was SVN r3778.
2004-12-10 20:44:22 +00:00
Tim Woodall
082beac961 gpr callback free's the message - don't do it here
This commit was SVN r3777.
2004-12-10 20:43:37 +00:00
Mitch Sukalski
1741006a4e first version of memory registration bug fixes: RDMA write target
memory freed on receipt of FIN packet with memory region information
from the sender (mca_ptl_ib_fin_header_t); RDMA write source
memory freed on completion notification from RDMA write...

This commit was SVN r3776.
2004-12-10 20:14:12 +00:00
Brian Barrett
e5605b3e1b * ignore the non-RSH pcms for a day or so while I get the job cleanup for
RSH figured out

This commit was SVN r3775.
2004-12-10 20:03:33 +00:00
Ralph Castain
72e1161605 Clean up the notify message class - missing the ompi_object_t element in structure.
This commit was SVN r3774.
2004-12-10 17:53:41 +00:00
Craig E Rasmussen
a4847ca677 Removed initial junk that was just for testing things out.
This commit was SVN r3773.
2004-12-10 17:28:41 +00:00
Ralph Castain
65f874c556 Fix some memory problems. This will temporarily break comm_spawn again - Tim needs to grab this to take a look at something in connect_accept.
This commit was SVN r3772.
2004-12-10 16:27:09 +00:00
Rich Graham
0d76949743 add #include <sys/stat.h> for the mkfifo() prototype.
This commit was SVN r3771.
2004-12-10 04:32:49 +00:00
Ralph Castain
2ee47e0708 Some cleanup at the end of the comm_spawn work.
Comm_spawn is now fully functional. I'll send out a separate message about some of the problems encountered, and resulting action items.

This commit was SVN r3770.
2004-12-10 02:34:19 +00:00
Tim Woodall
73a9e95816 test for null set of procs
This commit was SVN r3769.
2004-12-10 00:04:00 +00:00
Tim Woodall
04cb323004 - corrections for persistent requests
- sm/mx/tcp/self now pass all intel p2p tests (single threaded)

This commit was SVN r3768.
2004-12-09 23:58:23 +00:00
Rich Graham
29ebe90676 start to debug the non-symmetric shared memory case. Ping-pong
w/o threads runs correctly at this stage.  fixed errors with using
addresses valid in remote memory scope, not local memory scope.

This commit was SVN r3767.
2004-12-09 23:09:06 +00:00
Tim Woodall
1da70cd6a5 removed debug
This commit was SVN r3766.
2004-12-09 21:19:49 +00:00
Ralph Castain
be53d9d720 Comm_spawn stuff
This commit was SVN r3765.
2004-12-09 20:54:31 +00:00
Brian Barrett
cfe68a959b * add missing headers
This commit was SVN r3764.
2004-12-09 19:46:03 +00:00
Ralph Castain
3c29621a4f A little bit further.....transferring to Tim's box for further work.
This commit was SVN r3763.
2004-12-09 17:04:59 +00:00
Mitch Sukalski
02d5381c25 updated IB memory registry support with new public
interfaces for the other IB PTL code

This commit was SVN r3762.
2004-12-09 16:28:38 +00:00
Brian Barrett
13d23efb65 * ignore the elan ptl to save some build time
* remove ignore on the bjs discovery / mapping code.  It seems to work
  correctly
* Fix up svn:ignore properties in the dynamic-mca directory

This commit was SVN r3761.
2004-12-09 15:15:50 +00:00
Jeff Squyres
ce3d8f3812 Part 2 of the .ompi_unignore patch
This commit was SVN r3760.
2004-12-09 12:49:54 +00:00
Ralph Castain
95510d3684 Remove the diagnostic messages from the xcast. Move one step closer on comm_spawn....
This commit was SVN r3759.
2004-12-09 06:32:44 +00:00
Brian Barrett
017d39a633 * Be nice to users and resolve "localhost" or "127.0.0.1" to the current
node, since the default hostfile shipped with Open MPI contains only
  "localhost".  This is performed after all the other checks, so it should
  only get activated as a last-ditch effort.

This commit was SVN r3758.
2004-12-09 06:01:41 +00:00
Brian Barrett
c0f7b4580e * Fix dumb mistake in BJS llm where code was using an uninitialized variable
as loop ending condition
* Add resolver code for a couple of different node formats for BProc.  Given
  hostfiles can now be a combination of notations:

0
n1
2
master
self

This commit was SVN r3757.
2004-12-09 05:06:02 +00:00
Ralph Castain
ed197f0186 More minor changes that continue to make progress on comm_spawn. Nothing significant - no impact on other operations.
PLEASE NOTE: there are some diagnostic messages in oob_xcast that will print out. Please don't have a cow about them - they won't hurt nor injure anyone, and it's just there for a little while to help Tim and I debug a problem. Just didn't want to create yet another MCA parameter to debug 10 lines of code. :-) 

This commit was SVN r3756.
2004-12-09 04:54:37 +00:00
Mitch Sukalski
9b2a2f5247 support for InfiniBand memory registry - registry uses hashed array
for hints, and a ompi_rb_tree_t for all other search operations;
registration occurs only when a different memory range or access
privileges is needed; deregistration is lazy, and only done when
registration fails with VAPI_EAGAIN (no resources...)

This commit was SVN r3755.
2004-12-09 00:45:45 +00:00
Ralph Castain
d21c0027df Well, we are getting closer to resolving the comm_spawn problem. For the benefit of those that haven't been in the midst of this discussion, the problem is that this is the first case where the process starting a set of processes has not been mpirun and is not guaranteed to be alive throughout the lifetime of the spawned processes. This sounds simple, but actually has some profound impacts.
Most of this checkin consists of more debugging stuff. Hopefully, you won't see any printf's that aren't protected by debug flags - if you do, let me know and I'll take them out with my apologies.

Outside of debugging, the biggest change was a revamp of the shutdown process. For several reasons, we had chosen to have all processes "wait" for a shutdown message before exiting. This message is typically generated by mpirun, but in the case of comm_spawn we needed to do something else. We have decided that the best way to solve this problem is to:

(a) replace the shutdown message (which needed to be generated by somebody - usually mpirun) with an oob_barrier call. This still requires that the rank 0 process be alive. However, we terminate all processes if one abnormally terminates anyway, so this isn't a problem (with the standard or our implementation); and

(b) have the state-of-health monitoring subsystem issue the call to cleanup the job from the registry. Since the state-of-health subsystem isn't available yet, we have temporarily assigned that responsibility to the rank 0 process.  Once the state-of-health subsystem is available, we will have it monitor the job for all-processes-complete and then it can tell the registry to cleanup the job (i.e., remove all data relating to this job).

Hope that helps a little. I'll put all this into the design docs soon.

This commit was SVN r3754.
2004-12-08 21:44:41 +00:00
Tim Woodall
8749d0fe39 changed indexing to use smp index
This commit was SVN r3753.
2004-12-08 21:27:48 +00:00