1
1
Граф коммитов

62 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
25249c9843 Roll back r20926, r20941, r20950: Iain's changes from last week.
This has turned into an MPI spec interpretation issue.  :-(

Open MPI has intrepreted the spec one way for the past several years;
these commits reflect a different interpretation that changes how we
treat the EXTRA_STATE parameter to the Fortran attribute copy and
delete callbacks.  This new way breaks our internal copy of the Intel
Fortran attribute tests.  So after talking with Terry/Sun, we're going
to back out these changes (both here and on the v1.3 branch) until we
get further clarification from the Forum.

This commit was SVN r21028.

The following SVN revision numbers were found above:
  r20926 --> open-mpi/ompi@0a24eadaad
  r20941 --> open-mpi/ompi@045b0e8871
  r20950 --> open-mpi/ompi@73af921c22
2009-04-16 14:29:32 +00:00
Iain Bason
73af921c22 Update keyval_create functions.
The fix for #1864 in r20926 caused gcc to emit some warnings. Also,
Jeff Squyres pointed out parallel bugs in mpi_type_create_keyval and
mpi_win_create_keyval.

This commit was SVN r20950.

The following SVN revision numbers were found above:
  r20926 --> open-mpi/ompi@0a24eadaad
2009-04-07 13:47:52 +00:00
George Bosilca
045b0e8871 Remove two warnings about "cast from pointer to integer of different size".
This commit was SVN r20941.
2009-04-05 21:02:36 +00:00
Iain Bason
0a24eadaad Fix Fortran bindings for MPI_KEYVAL_CREATE and MPI_COMM_CREATE_KEYVAL.
The EXTRA_STATE parameter is passed by reference, and thus should be
dereferenced before it is stored.  Similarly, the stored value should
be passed by reference to the copy and delete routines.

This fixes trac:1864.

This commit was SVN r20926.

The following Trac tickets were found above:
  Ticket 1864 --> https://svn.open-mpi.org/trac/ompi/ticket/1864
2009-04-01 19:31:46 +00:00
Rainer Keller
04585ed8e3 - Explicit mark dependency
This commit was SVN r20771.
2009-03-12 22:48:41 +00:00
Rainer Keller
ec0ed48718 - Revert r20739
This commit was SVN r20742.

The following SVN revision numbers were found above:
  r20739 --> open-mpi/ompi@781caee0b6
2009-03-05 21:56:03 +00:00
Rainer Keller
781caee0b6 - First of two or three patches, in orte/util/proc_info.h:
Adapt orte_process_info to orte_proc_info, and
   change orte_proc_info() to orte_proc_info_init().
 - Compiled on linux-x86-64
 - Discussed with Ralph

This commit was SVN r20739.
2009-03-05 20:36:44 +00:00
Rainer Keller
811f2bd9b4 - As discussed on RFC, move the ompi_bitmap to the
opal layer.
   Add a check against a maximum (actually get rid of ifs internally to
   opal_bitmap.c) -- the functionality to set the current maximum size
   opal_bitmap_set_max_size() is currently only used in attribute.c
   to set the maximum OMPI_FORTRAN_HANDLE_MAX...

   Tested on linux/x86-64 with intel-tests with all_tests_no_perf_f
   run with 6 procs.
   Let's look into MTT as well...

This commit was SVN r20708.
2009-03-03 22:25:13 +00:00
Ralph Castain
ba5498cdc6 Repair the MPI-2 dynamic operations. This includes:
1. repair of the linear and direct routed modules

2. repair of the ompi/pubsub/orte module to correctly init routes to the ompi-server, and correctly handle failure to correctly parse the provided ompi-server URI

3. modification of orterun to accept both "file" and "FILE" for designating where the ompi-server URI is to be found - purely a convenience feature

4. resolution of a message ordering problem during the connect/accept handshake that allowed the "send-first" proc to attempt to send to the "recv-first" proc before the HNP had actually updated its routes.

Let this be a further reminder to all - message ordering is NOT guaranteed in the OOB

5. Repair the ompi/dpm/orte module to correctly init routes during connect/accept.

Reminder to all: messages sent to procs in another job family (i.e., started by a different mpirun) are ALWAYS routed through the respective HNPs. As per the comments in orte/routed, this is REQUIRED to maintain connect/accept (where only the root proc on each side is capable of init'ing the routes), allow communication between mpirun's using different routing modules, and to minimize connections on tools such as ompi-server. It is all taken care of "under the covers" by the OOB to ensure that a route back to the sender is maintained, even when the different mpirun's are using different routed modules.

6. corrections in the orte/odls to ensure proper identification of daemons participating in a dynamic launch

7. corrections in build/nidmap to support update of an existing nidmap during dynamic launch

8. corrected implementation of the update_arch function in the ESS, along with consolidation of a number of ESS operations into base functions for easier maintenance. The ability to support info from multiple jobs was added, although we don't currently do so - this will come later to support further fault recovery strategies

9. minor updates to several functions to remove unnecessary and/or no longer used variables and envar's, add some debugging output, etc.

10. addition of a new macro ORTE_PROC_IS_DAEMON that resolves to true if the provided proc is a daemon

There is still more cleanup to be done for efficiency, but this at least works.

Tested on single-node Mac, multi-node SLURM via odin. Tests included connect/accept, publish/lookup/unpublish, comm_spawn, comm_spawn_multiple, and singleton comm_spawn.

Fixes ticket #1256

This commit was SVN r18804.
2008-07-03 17:53:37 +00:00
Dan Lacher
98f70d6318 Convert the C++ Comm, Datatype and Winn keyval creation and intercept callbacks
to *not* use the STL as well as removing the STL use from the error handler
routines.  This was removing the STL from the C++ bindings (Solaris has 2
versions of the STL; if OMPI uses one and an MPI application wants to use
another, Bad Things happen).

The main idea is to wrap up the C++ callback function pointers and the user's
extra_state into our own struct that is passed as the extra_state to the C
keyval registration along with the intercept routines in intercepts.cc. When the
C++ intercepts are activated, they unwrap the user's callback and extra state
and call them.

This commit was SVN r17409.
2008-02-10 19:29:25 +00:00
Jeff Squyres
3616b03eb3 Fix a comment -- we implemented windows a long time ago.
This commit was SVN r16657.
2007-11-05 13:43:53 +00:00
Jeff Squyres
b5abb12c98 Commit Ralph's fix for MPI_APPNUM.
This commit was SVN r16371.
2007-10-06 18:54:43 +00:00
Tim Prins
c46ed1d5d4 Make it so the universe size is passed through the ODLS instead of through a gpr trigger during MPI init. This matches what is currently being done with the app number.
The default odls has been updated and works fine. The process odls has been updated, but I could not verify its operation. The bproc ODLS has not been updated yet. Ralph will look at it soon.

This commit was SVN r15257.
2007-07-02 01:33:35 +00:00
Brian Barrett
84d1512fba Add the potential for doing some basic error checking on mutexes during
single threaded builds.  In its default configuration, all this does
is ensure that there's at least a good chance of threads building
based on non-threaded development (since the variable names will be
checked).  There is also code to make sure that a "mutex" is never
"double locked" when using the conditional macro mutex operations.
This is off by default because there are a number of places in both
ORTE and OMPI where this alarm spews mega bytes of errors on a
simple test.  So we have some work to do on our path towards
thread support.

Also removed the macro versions of the non-conditional thread locks,
as the only places they were used, the author of the code intended
to use the conditional thread locks.  So now you have upper-case
macros for conditional thread locks and lowercase functions for
non-conditional locks.  Simple, right? :).

This commit was SVN r15011.
2007-06-12 16:25:26 +00:00
Sven Stork
fe3b08004e - export symbols that are needed by the fortran libs
This commit was SVN r14527.
2007-04-26 09:34:41 +00:00
Tim Prins
bddd06bcdb pedantic formatting...
This commit was SVN r13766.
2007-02-23 00:54:41 +00:00
Jeff Squyres
260f1fd468 Fixes trac:817
The C++ bindings were not tracking keyvals properly -- they were
freeing some internal meta data when Free_keyval() was called, not
when the keyval was actually destroyed (keyvals are refcounted in the
C layer, just like all other MPI objects, because they can live for
long after their corresponding Free call is invoked).  This commit
fixes this problem and several other things:

 * Add infrastructure on the ompi_attribute_keyval_t for an "extra"
   destructor pointer that will be invoked during the "real"
   constructor (i.e., when OBJ_RELEASE puts the refcount to 0).  This
   allows calling back into the C++ layer to release meta data
   associated with the keyval.
 * Adjust all cases where keyvals are created to pass in relevant
   destructors (NULL or the C++ destructor).
 * Do essentially the same for MPI::Comm, MPI::Win, and MPI:Datatype:
   * Move several functions out of the .cc file into the _inln.h file
     since they no longer require locks
   * Make the 4 Create_keyval() functions call a common back-end
     keyval creation function that does the Right Thing depending on
     whether C or C++ function pointers were used for the keyval
     functions.  The back-end function does not call the corresponding
     C MPI_*_create_keyval function, but rather does the work itself
     so that it can associate a "destructor" callback for the C++
     bindings for when the keyval is actually destroyed.
   * Change a few type names to be more indicative of what they are
     (mostly dealing with keyvals [not "keys"]).
 * Add the 3 missing bindings for MPI::Comm::Create_keyval().
 * Remove MPI::Comm::comm_map (and associated types) because it's no
   longer necessary in the intercepts -- it was a by-product of being
   a portable C++ bindings layer.  Now we can just query the C layer
   directly to figure out what type a communicator is.  This solves
   some logistics / callback issues, too.
 * Rename several types, variables, and fix many comments in the
   back-end C attribute implementation to make the names really
   reflect what they are (keyvals vs. attributes).  The previous names
   heavily overloaded the name "key" and were ''extremely''
   confusing.

This commit was SVN r13565.

The following Trac tickets were found above:
  Ticket 817 --> https://svn.open-mpi.org/trac/ompi/ticket/817
2007-02-08 23:50:04 +00:00
Ralph Castain
0a5d41857a Complete next round of message size reduction: "strip" the descriptive info from the returned values. I have now added a flag to the gpr address mode (ORTE_GPR_STRIPPED) that instructs the gpr to not include segment names or tokens in the returned gpr_value_t objects.
I found only two places that were looking at the tokens:

1. the odls - we used the tokens to separately process the globals container data from everything else. In this case, I left the subscription that returned the globals data alone, but "stripped" the subscription that returned the launch data for the procs. These subscriptions have nothing to do with the xcast message.

2. the pml_base_modex - the callback function was getting process names from the returned tokens. Actually, this function was doing a very bad thing - it was assuming that the first token returned was *always* the process name. This is currently true, but is one of those assumptions that someone could have easily changed - and suddenly found the system inexplicably failing. I modified the function to (a) get the name sent back to us, (b) "stripped" the value structures of tokens and segment strings, and (c) correctly obtained process names from the returned values. I also reindented the heck out of the code so it was legible (at least, to my old eyes).

This commit was SVN r12813.
2006-12-09 23:10:25 +00:00
Ralph Castain
a1153fdc8f Eliminate virtually all of the attribute_predefined data from the STG1 message. We now compute the total number of slots allocated to us and save that in the registry - the attributed_predefined then retrieves it via the STG1 message. The app_num is passed via the process_info structure, which gets the value from the ODLS in the environment.
Obviously, people like bproc will have to get the app_num via another avenue...but that's a problem for another day. Several options are easily available.

This commit was SVN r12788.
2006-12-07 03:11:20 +00:00
Edgar Gabriel
1359ba9b13 Rewriting much of the errorcode and errorclass code, since
- we have to be able to attach a string to an error class, not just to an
 error code
 - according to MPI-2 the attribute MPI_LASTUSEDCODE has to be updated
  everytime you add a new code or a new class. Thus, you have to have single
  list for both. 

Thus, we got rid of the error_class structure. In the error-code structure, we
can distinguish whether we are dealing with an error code or an error class by
looking at the err->code element of the structure. In case its value is
MPI_UNDEFINED, the according entry is a class, else it is an error code. All
predefined error codes have the code and the class field set to the same
value.

The test MPI_Add_error_class1 passes now.

Fixes trac:418

This commit was SVN r12764.

The following Trac tickets were found above:
  Ticket 418 --> https://svn.open-mpi.org/trac/ompi/ticket/418
2006-12-05 19:07:02 +00:00
George Bosilca
a0ed53d70b Make the compilers happy.
This commit was SVN r12729.
2006-12-03 00:19:11 +00:00
Ralph Castain
6d6cebb4a7 Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things).
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.

I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).

This commit was SVN r12597.
2006-11-14 19:34:59 +00:00
Ralph Castain
8b6921f297 Initialize the rank variable before it is used.
This commit was SVN r12565.
2006-11-13 02:37:12 +00:00
Ralph Castain
17c71f8d2a Fix ticket #545
Setup subscriptions to correctly return the MPI_APPNUM attribute.

Fix an unreported bug that was found. The universe size was incorrectly defined in the attributes code. As coded, it looked for size_t values and based its size computation on those numbers. Unfortunately, the node_slots value had been changed to an orte_std_cntr_t awhile back! So the universe size was never updated.

Update the hello_nodename test to check for MPI_APPNUM.

Add a definition to ns_types for ORTE_PROC_MY_NAME - just a shortcut for orte_process_info.my_name. Brought over from ORTE 2.0 as it will be used extensively there.

This commit was SVN r12377.
2006-10-31 21:29:07 +00:00
George Bosilca
06563b5dec Last set of explicit conversions. We are now close to the zero warnings on
all platforms. The only exceptions (and I will not deal with them
anytime soon) are on Windows:
- the write functions which require the length to be an int when it's
  a size_t on all UNIX variants.
- all iovec manipulation functions where the iov_len is again an int
  when it's a size_t on most of the UNIXes.
As these only happens on Windows, so I think we're set for now :)

This commit was SVN r12215.
2006-10-20 03:57:44 +00:00
George Bosilca
3f0a7cad9e The last patch for Windows support. Mostly casting and conversion to C++ friendly headers.
This commit was SVN r11400.
2006-08-24 16:38:08 +00:00
Ralph Castain
5dfd54c778 With the branch to 1.2 made....
Clean up the remainder of the size_t references in the runtime itself. Convert to orte_std_cntr_t wherever it makes sense (only avoid those places where the actual memory size is referenced).

Remove the obsolete oob barrier function (we actually obsoleted it a long time ago - just never bothered to clean it up).

I have done my best to go through all the components and catch everything, even if I couldn't test compile them since I wasn't on that type of system. Still, I cannot guarantee that problems won't show up when you test this on specific systems. Usually, these will just show as "warning: comparison between signed and unsigned" notes which are easily fixed (just change a size_t to orte_std_cntr_t).

In some places, people didn't use size_t, but instead used some other variant (e.g., I found several places with uint32_t). I tried to catch all of them, but...

Once we get all the instances caught and fixed, this should once and for all resolve many of the heterogeneity problems.

This commit was SVN r11204.
2006-08-15 19:54:10 +00:00
George Bosilca
a08f087447 Add the last value to the switch.
This commit was SVN r10738.
2006-07-11 15:59:38 +00:00
George Bosilca
623dd3357d Create another enum item that means the attribute is not initialized. The problem,
was that the compilers complain about setting a variable of an enum type to something
not in the enum.

This commit was SVN r10737.
2006-07-11 15:28:32 +00:00
Jeff Squyres
80597b9d08 George found that a whole bunch of lines still had tabs in them
(apparently from long, long ago).  This commit is solely changing tabs
to spaces -- no functionality or other changes.

This commit was SVN r10731.
2006-07-11 13:57:39 +00:00
Jeff Squyres
429c25095e Fix for bug #176.
* Fix for two problems introduced by r10661:
   1. ensure to use the key ''after'' it is initialized (sigh).
   1. handle the case where we free the attrkey before it is fully
      initialized (i.e., some other error causes us to free it).  In
      this case, don't try to remove the key from the hash map,
      because it won't exist.
 * More accurate zeroing in the keyval constructor
   (ompi_attrkey_item_constructor)
 * Widen the scope of the alock such that the attrkey destructor does
   not need to acquire it.  Instead, assume that the caller already
   has it.
 * Add a comment about why the keyval may get destroyed as the result
   of deleting an attribute (so that I don't have to figure it out
   again the next time I read this code :-) )

This commit was SVN r10664.

The following SVN revision numbers were found above:
  r10661 --> open-mpi/ompi@fdba2c9df0
2006-07-05 20:23:08 +00:00
Jeff Squyres
fdba2c9df0 Per the analysis in bug #184, move some assignments around to effect
thread safety.  This is likely to be only the first of multiple steps
for complete thread safety in the MPI attribute code.  All tests
[continue to] pass the intel and ibm attribute tests.

Also renamed a variable from "attr" to "attrkey" to reflect that it's
a keyval, not an attribute.

This commit was SVN r10661.
2006-07-05 17:37:17 +00:00
Jeff Squyres
3e86381533 Add thread protection -- must only construct the alock when we have
threading support.

This commit was SVN r10138.
2006-05-31 13:48:21 +00:00
Gleb Natapov
d2c7bcfbe1 init alock mutex before use.
This commit was SVN r10135.
2006-05-31 06:37:39 +00:00
Jeff Squyres
f6bbe033f0 The output of the copy function is a logical, not an int. So we need
to use the appropriate macro for all the Fortran .TRUE. handling, or
things get misinterpeted and, with some compilers, it will look like
the attribute wasn't copied properly.

This commit was SVN r9536.
2006-04-05 19:00:11 +00:00
George Bosilca
85bb1a9c90 Add one more argument to the copy functions for the MPI objects. As this argument
is the last one on the list and as on C the caller "make it right" this addition
will not affect the way we handle the user defined copy functions. Only the C
version of the function has this additional parameter. As it represent the pointer
to the newly created MPI object It hold the key to allow us to modify the new
object (communicator, window or type) depending on some key stored on the initial
communicator.

This commit was SVN r9371.
2006-03-23 04:47:14 +00:00
Rainer Keller
4b1194056f - Upon Finalize, also release the memory of predefined attributes.
This commit was SVN r9169.
2006-02-27 15:15:48 +00:00
Brian Barrett
566a050c23 Next step in the project split, mainly source code re-arranging
- move files out of toplevel include/ and etc/, moving it into the
    sub-projects
  - rather than including config headers with <project>/include, 
    have them as <project>
  - require all headers to be included with a project prefix, with
    the exception of the config headers ({opal,orte,ompi}_config.h
    mpi.h, and mpif.h)

This commit was SVN r8985.
2006-02-12 01:33:29 +00:00
Ralph Castain
4b9f015c0b Merge in the new data support subsystem for ORTE. MPI folks should not notice a difference. Longer explanation will be sent to developers mailing list.
This commit was SVN r8912.
2006-02-07 03:32:36 +00:00
Brian Barrett
b1d2424013 Merge in present work on the MPI-2 onesided chapter. The current code is not
complete, but stable enough that it will have no impact on general development,
so into the trunk it goes.  Changes in this commit include:

 - Remove the --with option for disabling MPI-2 onesided support.  It
   complicated code, and has no real reason for existing
 - add a framework osc (OneSided Communication) for encapsulating
   all the MPI-2 onesided functionality
 - Modify the MPI interface functions for the MPI-2 onesided chapter
   to properly call the underlying framework and do the required
   error checking
 - Created an osc component pt2pt, which is layered over the BML/BTL
   for communication (although it also uses the PML for long message
   transfers).  Currently, all support functions, all communication
   functions (Put, Get, Accumulate), and the Fence synchronization
   function are implemented.  The PWSC active synchronization
   functions and Lock/Unlock passive synchronization functions are
   still not implemented

This commit was SVN r8836.
2006-01-28 15:38:37 +00:00
George Bosilca
6fb4ce5e2e Some dependencies cleanups (there were on hold for a while).
This commit was SVN r8425.
2005-12-09 05:14:18 +00:00
Jeff Squyres
42ec26e640 Update the copyright notices for IU and UTK.
This commit was SVN r7999.
2005-11-05 19:57:48 +00:00
Jeff Squyres
9e723b8519 Remove some compiler warnings
This commit was SVN r7973.
2005-11-03 15:26:43 +00:00
Brian Barrett
1302cb4072 The next in a long line of crazed build system changes from Brian. This was
originally suggested by Ralf Wildenhues, to try to speed autogen, configure,
and make (and possibly even make install).  Use automake's include directive
to drastically reduce the number of Makefile files (although the number of
Makefile.am files is the same - most are just included in a top-level
Makefile.am).  Also use an Automake SUBDIRs feature to eliminate the
dynamic-mca tree, which was no longer really needed.  This makes adding
a framework easier (since you don't have to remember the dynamic-mca
tree) and makes building faster (as make doesn't have to recurse through
the dynamic-mca tree)

This commit was SVN r7777.
2005-10-17 00:21:10 +00:00
George Bosilca
948683215b And more fixes ...
This commit was SVN r7321.
2005-09-12 20:25:01 +00:00
Brian Barrett
ed56e743b7 * update configure.ac to use the modern version of AC_INIT and
AM_INIT_AUTOMAKE, instead of the deprecated version.
* Work around dumbness in modern AC_INIT that requires the version
  number to be set at autoconf time (instead of at configure time, as
  it was before).  Set the version number, minus the subversion r number,
  at autoconf time.  Override the internal variables to include the r
  number (if needed) at configure time.  Basically, the right thing
  should always happen.  The only place it might not is the version
  reported as part of configure --help will not have an r number.
* Since AM_INIT_AUTOMAKE taks a list of options, no need to specify
  them in all the Makefile.am files.
* Addes support for subdir-objects, meaning that object files are put
  in the directory containing source files, even if the Makefile.am is
  in another directory.  This should start making it feasible to
  reduce the number of Makefile.am files we have in the tree, which
  will greatly reduce the time to run autogen and configure.

This commit was SVN r7211.
2005-09-07 05:54:53 +00:00
Ralph Castain
96f4bb7a63 Hey, sports fans!! Guess what??
Here's the huge registry check-in you've all been waiting for with baited breath. The revised version sends a single message to all processes at the various stage gates, thus making the startup much more scalable. I could provide you with all the tawdry details, but won't for now - you are welcome to ask, though, and I'll merrily bore your ears to tears.

In addition, the commit contains the following:

1. set the ignore properties on ompi/debuggers and orte/mca/pls/poe

2. Added simplified subscribe and put functions to the registry's API. I have also converted all of the ompi functions that registered subscriptions to the new API, and caught their associated put's as well.

In a follow-on commit, I'll be adding support for George's hetero arch registry subscription (wanted to get this one in first).

This commit was SVN r7118.
2005-09-01 01:07:30 +00:00
Jeff Squyres
c9cdb36b0b Finally get this right: move orte_sys_info.[ch] back into the orte
tree.
- fix up #include's throughout the tree (yay contrib/search_replace.pl!)
- remove a few extraneous #include's
- remove orte_sys_info*() from opal_init()/opal_finalize() (it's
  already in orte_init_stage1() and orte_system_finalize())
- remove dependencies in opal on orte_system_info -- util/os_path.c
  and util/os_create_dirpath.c (they only used path_sep, anyway --
  easily changed to #defines)

This commit was SVN r7059.
2005-08-26 21:03:41 +00:00
Brian Barrett
f273d84b1b * update ob1 to direct call
* don't know what I was thinking, but can't use the MCA_PML_CALL macro on
  the two data values, as they don't have things that the macro can
  expand into

This commit was SVN r6868.
2005-08-14 03:14:20 +00:00
Jeff Squyres
cf16a521c8 Ensure to get ompi/include/constants.h
This commit was SVN r6845.
2005-08-12 21:42:07 +00:00