1
1
Граф коммитов

4770 Коммитов

Автор SHA1 Сообщение Дата
George Bosilca
c7acb3bc5f Still sync with the beta ...
Use the correct indentation.
Now we can force the progress function to grab as many events as possible
(in order to avoid starvation for the send queue).
Add more elems in the unexpected queue (internal buffers use to temporary
store the data for the unexpected messages).
Decrease the number of variables in some functions (cleanup).
Avoid using goto ...

This commit was SVN r5949.
2005-06-06 18:42:24 +00:00
George Bosilca
462fe884c8 Still bringing the trunk in sync with the beta.
This one remove some useless code and some compilation warnings.

This commit was SVN r5948.
2005-06-06 18:35:02 +00:00
George Bosilca
593f7c1be5 Do not set the stack for this convertor. As NOBODY is supposed to use it we can have it uninitialized in order to decrease the initial overhead for the first fragment.
This commit was SVN r5947.
2005-06-06 18:33:52 +00:00
George Bosilca
43876d080c Bringing trunk in sync with the beta.
Split the datatypes in 3 categories:
1. basic datatypes: count always one and the datatype is always
contiguous
2. complex datatypes composed on one basic type with a count. Most of
the time these datatypes will be contiguous.
3. complex datatypes composed by 2 basic types. Depending on the
architecture these types can be non contiguous.

Reorder the defines to match the previous categories.  Add some
comment to describe the changes in the files.
Clean-up the flags:
- DT_FLAG_PREDEFINED is attached to all predefined datatypes.
- DT_FLAG_CONTIGUOUS is attached to all contiguous one. This flag is
  detected at runtime depending on the architecture.

This commit was SVN r5946.
2005-06-06 18:19:11 +00:00
Tim Woodall
e830442d04 bytes_received is initialized and passed in
This commit was SVN r5945.
2005-06-06 16:56:42 +00:00
George Bosilca
ea1872f1d3 Correctly handle the length for unexpected messages (add one more field in the recv frag
struct).

This commit was SVN r5944.
2005-06-06 16:19:20 +00:00
Tim Woodall
3c4c272714 constant for maximum number of allowed segments in a descriptor
This commit was SVN r5943.
2005-06-06 16:18:57 +00:00
Tim Woodall
bd59bb4a16 handle receive of user data
This commit was SVN r5942.
2005-06-06 16:17:43 +00:00
Jeff Squyres
9e9a93e2ab Fixes the BJS compile problem by updating it to use the new API. The
-bynode and -byslot orterun command line parameters now set a single
MCA param (ras_base_schedule_policy) which is looked up by the
following components to decide which RAS base API function to invoke:

ras base bjs
ras base host
rmaps round_robin

This commit was SVN r5941.
2005-06-06 13:43:20 +00:00
Jeff Squyres
620e55516e Change the perms of the file to be 0640, not 0666.
This commit was SVN r5940.
2005-06-06 13:36:41 +00:00
Tim Woodall
207032a10e fix for ?missing? function
This commit was SVN r5933.
2005-06-03 16:42:00 +00:00
Brian Barrett
690d03eea8 * back out r5925, which added a second pty for stdin
* turn off echoing on the pty (which was what r5925 was trying to
  do).

With this patch, stdin forwarding to rank 0 looks good, with the exception
of the initialization delay until stage gate 1.

This commit was SVN r5930.

The following SVN revision numbers were found above:
  r5925 --> open-mpi/ompi@e406a4b1aa
2005-06-02 21:15:26 +00:00
Jeff Squyres
0ffbc5506a Add bmi into the SUBDIRS list.
This commit was SVN r5928.
2005-06-02 20:26:38 +00:00
Tim Woodall
878490a405 return resources
This commit was SVN r5927.
2005-06-02 19:51:51 +00:00
Tim Woodall
5a00e53ab1 init lock state
This commit was SVN r5926.
2005-06-02 19:36:35 +00:00
Tim Woodall
e406a4b1aa corrections for stdin - stdin shouldn't be using same pty
as stdout - corrects duplicate output

This commit was SVN r5925.
2005-06-02 19:17:32 +00:00
Tim Woodall
78d713b70a misc fixes
This commit was SVN r5924.
2005-06-02 17:42:53 +00:00
Tim Woodall
a5d13104f9 acks should now be handled correctly for stdin
This commit was SVN r5923.
2005-06-02 17:06:33 +00:00
Tim Woodall
5728748980 handle local acks correctly
This commit was SVN r5922.
2005-06-02 17:05:52 +00:00
Tim Woodall
e21ffa48e2 bug fix - generate a completion callback for local sends
This commit was SVN r5921.
2005-06-02 17:03:08 +00:00
Brian Barrett
c073999c7f * fix XGrid to match API change
This commit was SVN r5917.
2005-06-02 01:43:05 +00:00
Brian Barrett
42f8ea9389 * when bringing over from branch, forgot to fix up for the local fork()
setup.  Now uses fork() instead of ssh if the target nodename is the
  same as the current nodename (which will happen if the user gave
  "localhost" or just the hostname without the domain) or if the
  target nodename is local according to ompi_ifislocal() (which will
  happen if the user gave a FQDN)

This commit was SVN r5916.
2005-06-01 21:45:26 +00:00
Tim Woodall
e7332d0521 cleanup - support for sender side scheduling (non-rdma case)
This commit was SVN r5915.
2005-06-01 21:09:43 +00:00
Brian Barrett
8b411a10be - revert to resolving "localhost" to the contents of
orte_system_info.nodename so that cleanup and the like occur
  correctly.  Otherwise, the daemon on localhost and an MPI process
  can have different ideas on what the local nodename is, and that
  lead to all kinds of badness with both process killing and cleanup.
  Also fixes the annoying ssh keys problem when sshing to localhost.
- modify the rsh pls to ssh to localhost if the target nodename is the
  same as orte_system_info.nodename AND is not resolvable (ie, ssh to
  would fail).  Otherwise, ssh to nodename.  This should work around
  the issues Ralph was seeing with ssh failing on his laptop (since
  the above change undid the previous fix to this problem).
- Small change to ompi_ifislocal() to squelch a warning message about
  unresolvable hostnames when checking to see if a name is, in fact,
  resolvable.
- Force ORTE process to have same nodename field as it's starting
  daemon (assuming it was started using the fork pls), so that the
  fork pls can properly kill the process, and cleanup its session
  directory on abnormal exit.

This commit was SVN r5914.
2005-06-01 19:30:05 +00:00
Brian Barrett
465b54a3f0 * start of support for stdin forwarding. stdin is now forwarded to vpid 0
of the started job (which should be rank 0 of the started MPI job).  Still
  some issues for Tim / Ralph to work out (below).  Only works from MPI_Init
  onward.  Remaining issues:

     - Need to move the orte_rmgr_urm_wireup_stdin() call from STG1 to
       when everyone sets LAUNCHED state.  Tim/Ralph are going to look
       at adding this code
     - stdin frags are not properly acked, leading to some shutdown
       workarounds.  Tim is going to look at this one.
     - Probably somehow related to the 2nd point, stdin text appears
       to be echoed by the IOF framework

This commit was SVN r5913.
2005-06-01 19:23:23 +00:00
Galen Shipman
aaa236052d changed function signitures to match the changes in mpool
This commit was SVN r5911.
2005-06-01 15:25:17 +00:00
Tim Woodall
4ce8f91b6a updates to bmi and pml I/F
This commit was SVN r5910.
2005-06-01 14:34:22 +00:00
Jeff Squyres
19b4479a0a Patch for systems with broken Fortran compilers (e.g., OS X Tiger
[10.4] with gfortran 4.0) who need to be able to add flags to compile
simple Fortran executables that use libc routines.

Notably, for Tiger with gfortran 4.0 installed, you'll need to:

    ./configure F77=gfortran FC=gfortran LIBS=-lSystemStubs

This commit was SVN r5909.
2005-06-01 10:53:44 +00:00
Jeff Squyres
346921e9e7 Add Makefile.am (and related support) for dynamic builds of bmi
components.

This commit was SVN r5908.
2005-06-01 09:31:08 +00:00
Tim Woodall
3e07a64ade dont allow synchronous request to complete until ack is received
This commit was SVN r5907.
2005-05-31 21:56:43 +00:00
Galen Shipman
2b2b8fa283 fixed mpool_base calls to include the mpool module.
This commit was SVN r5905.
2005-05-31 20:34:03 +00:00
George Bosilca
a285ecce5e PID's should be of type pid_t and should use the GPR union member
"pid", not "size".

This commit was SVN r5904.
2005-05-31 19:25:42 +00:00
Tim Prins
75b0b519d8 - Added functionality to MPI_Alloc_mem and MPI_Free_mem so that they
call the memory pool to do special memory allocations, and extended 
the mpool so that it will do the allocations and keep tack of them in
a tree. Currently, if you pass MPI_INFO_NULL to MPI_Alloc_mem, we will 
try to allocate the memory and register it with as many mpools as 
possible. Alternatively, one can pass an info object with the names of 
the mpools as keys, and from these we decide which mpools to register 
the new memory with.

- fixed some comments in the allocator and fixed a minor bug

- extended the red black tree test and made a minor correction

This commit was SVN r5902.
2005-05-31 19:07:27 +00:00
Galen Shipman
459be82daa Removed generated file ..
This commit was SVN r5900.
2005-05-31 17:45:13 +00:00
George Bosilca
dba4d91d96 strcmp is defined on string.h on Linux so we have to include it.
This commit was SVN r5898.
2005-05-31 17:41:37 +00:00
Galen Shipman
4c208f7964 Common source files used by mpool and bmi
This commit was SVN r5897.
2005-05-31 17:09:55 +00:00
Galen Shipman
5ccaaf55e2 Initial checkin of VAPI allocator.
This commit was SVN r5896.
2005-05-31 17:08:41 +00:00
Galen Shipman
f16f9703a5 Modified the mpool and allocator to allow user defined data to be passed in and out of the mpool allocate functions, this is necessary if we use the mpool to allocate IB registered memory as need to pass in the hca handle and pass out the memory region handle.
This commit was SVN r5895.
2005-05-31 17:06:55 +00:00
Jeff Squyres
c80f54052e (copied from a mail that has a lengthy description of this commit)
I spoke with Tim about this the other day -- he gave me the green
light to go ahead with this, but it turned into a bigger job than I
thought it would be.  I revamped how the default RAS scheduling and
round_robin RMAPS mapping occurs.  The previous algorithms were pretty
brain dead, and ignored the "slots" and "max_slots" tokens in
hostfiles.  I considered this a big enough problem to fix it for the
beta (because there is currently no way to control where processes are
launched on SMPs).

There's still some more bells and whistles that I'd like to implement,
but there's no hurry, and they can go on the trunk at any time.  My
patches below are for what I considered "essential", and do the
following:

- honor the "slots" and "max-slots" tokens in the hostfile (and all
  their synonyms), meaning that we allocate/map until we fill slots,
  and if there are still more processes to allocate/map, we keep going
  until we fill max-slots (i.e., only oversubscribe a node if we have
  to).

- offer two different algorithms, currently supported by two new
  options to orterun.  Remember that there are two parts here -- slot
  allocation and process mapping.  Slot allocation controls how many
  processes we'll be running on a node.  After that decision has been
  made, process mapping effectively controls where the ranks of
  MPI_COMM_WORLD (MCW) are placed. Some of the examples given below
  don't make sense unless you remember that there is a difference
  between the two (which makes total sense, but you have to think
  about it in terms of both things):

1. "-bynode": allocates/maps one process per node in a round-robin
fashion until all slots on the node are taken.  If we still have more
processes after all slots are taken, then keep going until all
max-slots are taken.  Examples:

- The hostfile:

  eddie slots=2 max-slots=4
  vogon slots=4 max-slots=8

- orterun -bynode -np 6 -hostfile hostfile a.out

  eddie: MCW ranks 0, 2
  vogon: MCW ranks 1, 3, 4, 5

- orterun -bynode -np 8 -hostfile hostfile a.out

  eddie: MCW ranks 0, 2, 4
  vogon: MCW ranks 1, 3, 5, 6, 7
  -> the algorithm oversubscribes all nodes "equally" (until each
  node's max_slots is hit, of course)

- orterun -bynode -np 12 -hostfile hostfile a.out

  eddie: MCW ranks 0, 2, 4, 6
  vogon: MCW ranks 1, 3, 5, 7, 8, 9, 10, 11

2. "-byslot" (this is the default if you don't specify -bynode):
greedily takes all available slots on a node for a job before moving
on to the next node.  If we still have processes to allocate/schedule,
then oversubscribe all nodes equally (i.e., go round robin on all
nodes until each node's max_slots is hit).  Examples:

- The hostfile

  eddie slots=2 max-slots=4
  vogon slots=4 max-slots=8

- orterun -np 6 -hostfile hostfile a.out

  eddie: MCW ranks 0, 1
  vogon: MCW ranks 2, 3, 4, 5

- orterun -np 8 -hostfile hostfile a.out

  eddie: MCW ranks 0, 1, 2
  vogon: MCW ranks 3, 4, 5, 6, 7
  -> the algorithm oversubscribes all nodes "equally" (until max_slots
  is hit)

- orterun -np 12 -hostfile hostfile a.out

  eddie: MCW ranks 0, 1, 2, 3
  vogon: MCW ranks 4, 5, 6, 7, 8, 9, 10, 11

The above examples are fairly contrived, and it's not clear from them
that you can get different allocation answers in all cases (the
mapping differences are obvious).  Consider the following allocation
example:

- The hostfile

  eddie count=4
  vogon count=4
  earth count=4
  deep-thought count=4

- orterun -np 8 -hostfile hostfile a.out

  eddie: 4 slots will be allocated
  vogon: 4 slots will be allocated
  earth: no slots allocated
  deep-thought: no slots allocated

- orterun -bynode -np 8 -hostfile hostfile a.out

  eddie: 2 slots will be allocated
  vogon: 2 slots will be allocated
  earth: 2 slots will be allocated
  deep-thought: 2 slots will be allocated

This commit was SVN r5894.
2005-05-31 16:36:53 +00:00
Jeff Squyres
497580441d Per MPI-2:3.1, MPI_GET_VERSION can be called before MPI_INIT, so
remove the MPI_ERR_INIT_FINALIZE() macro.  Also check to see how we
invoke the errhandler if an error occurs (i.e., the action depends on
whether we're between MPI_INIT and MPI_FINALIZE or not).

This commit was SVN r5891.
2005-05-31 16:30:34 +00:00
Jeff Squyres
9c2554d8ce Since we allow the following keys:
cpu
count
slots

We should allow *max* versions of all of those, rather than just
slots-max (and its variations).

This commit was SVN r5889.
2005-05-27 17:28:30 +00:00
Jeff Squyres
843cd2dbac More work with Ralph -- now we think we have it right. Here's the
additions from his previous commit:

- Properly propagate error upwards if we have a losthost+other_node
  error
- Added logic to handle multiple instances of the same hostname
- Added logic to properly increment the slot count for multiple
  instances.  For example, a hostfile with:

  foo.example.com
  foo.example.com slots=4
  foo.example.com slots=8

  would result in a single host with a slot count of 13 (i.e., if no
  slot count is specified, 1 is assumed)
- Revised the localhost logic a bit -- some cases are ok (e.g.,
  specifying localhost multiple times is ok, as long as there are no
  other hosts)

This commit was SVN r5886.
2005-05-26 21:44:45 +00:00
George Bosilca
fa8889bafa little buggy thing ... hunted for hours ...
The problem was that the displacement was increased even when the current memcpy completly
succeed. It not a problem for most of the cases ... except when we completly finish a
data.

This commit was SVN r5885.
2005-05-26 21:44:24 +00:00
George Bosilca
0fbf302080 More output for debug.
This commit was SVN r5883.
2005-05-26 21:42:15 +00:00
George Bosilca
cd84c1cb10 Replace TAB with spaces.
This commit was SVN r5882.
2005-05-26 21:41:51 +00:00
Ralph Castain
cff4dcebc1 Short explanation: fix how we handle the "localhost" entry in the hostfile so that the Mac (and other multi-NIC systems) will work.
Long explanation: Jeff and I spent some time chasing this down today (mostly Jeff), and found that the Mac was having problems with the replacement of "localhost" with the local nodename when we read the hostfile. Jeff then found that the Linux documentation specifically warns about the vaguery of the value returned for "nodename" (see the man page for uname for details). Sooo....when we replaced "localhost" with the local "nodename", the system couldn't figure out what node we were referring to when we tried to launch.

Solution (borrowed from LAM): if the user includes "localhost" in the hostfile, then we do NOT allow any other entries in the hostfile - the presence of another entry will generate an error message and cause mpirun to gracefully exit. Obviously, then, if "localhost" is specified in the hostfile, then we are running the application locally.

This commit was SVN r5881.
2005-05-26 19:43:21 +00:00
George Bosilca
08cff446f2 Few improvements:
- creating the stack work now even for contiguous data (with gaps around) and
  independing on the fragment size.
- add a TYPE argument to the PUSH_STACK macro. It's too obscure to explain it here :)
- in dt_add we avoid surrounding a datatype with loops if we can handle it by increasing the
  count of the datatype (only if the datatype contain one type element and if the extent
  match). But it's enough to speed up a lot the packing/unpacking of all composed predefined
  datatypes (line MPI_COMPLEX and co.).
- in dt_module.c improve the handling of the flags for all composed predefined
  datatypes. There is still something to do for the Fortran datatypes but it will be on
  the next commit.

This commit was SVN r5879.
2005-05-26 17:32:18 +00:00
Jeff Squyres
6781900f98 (copied from an e-mail, just so that I don't have to re-type the
entire explanation ;-) )

Our Abaqus friends just pointed out another bug to me.  We have the
"-x" option to orterun to export environment variables to
newly-started processes.  However, it doesn't work if the environment
variable is already set in the target environment.  For example:

         mpirun -x LD_LIBRARY_PATH -np 2 a.out

The app context correctly contains LD_LIBRARY_PATH and its value, and
that app context correctly propagates out to the orted and is present
when we fork/exec a.out.  However, if LD_LIBRARY_PATH is already set
in the newly-started process' environment, the fork pls won't override
it with the value from the app context.

It really only has to do with the ordering of arguments in
ompi_environ_merge() -- when merging to env arrays together, we
"prefer" one set to the other if there are duplicate names.  I think
that if the user wants to override variables (even variables like
LD_LIBRARY_PATH), we should let them -- it's necessary for some
applications (like in Abaqus' case).  If they screw it up, it's their
fault (e.g., setting some LD_LIBRARY_PATH that won't work).

That being said, we should *not* allow them to override specific MCA
parameters that are necessary for startup -- that's easy to accomplish
by setting up that stuff *after* we merge in the context app
environment.

Also note that I am *only* speaking about the fork pls here -- so this
only applies to started ORTE job processes, not the orted.

So an easy re-order to do the following:

   env_copy = merge(environ and context->app)
   ompi_setenv(...MCA params necessary for startup..., env_copy)
   execve(..., env_copy)

does what we want.

This commit was SVN r5878.
2005-05-26 15:57:48 +00:00
Jeff Squyres
0fb6121bfd After yet another round of discussions about why these classes are
split between OMPI and ORTE, added a lengthy comment to ompi_bitmap.h
explaining the reason why (and how it would be fine to re-merge them
-- if someone has the time) and references to it from all the other
relevant .h files.

This commit was SVN r5876.
2005-05-26 13:12:11 +00:00
Jeff Squyres
84e70e279c Remove bad free (doxy docs say that freeing the result of
ompi_cmd_line_get_param() is a Bad Thing) that causes seg faults.

This commit was SVN r5873.
2005-05-26 02:44:09 +00:00