1
1
Граф коммитов

395 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
bd0b13221b Cleanup ompi_info change to silence compiler warning
This commit was SVN r29436.
2013-10-14 16:57:50 +00:00
Mike Dubman
2141e9e6b4 tools: Add oshmem_info utility
Reworked ompi_info tool to be close with orte_info implementation.
ompi_info_register_types(), ompi_info_close_components() and
ompi_info_show_ompi_version() are moved to runtime/ompi_info_support.c.

Added runtime/oshmem_info_support layer that exports following api to be
used into oshmem_info tool as
oshmem_info_register_types()
oshmem_info_register_framework_params()
oshmem_info_close_components()
oshmem_info_show_oshmem_version()
These functions call ompi_info_support related interfaces as long as
Oshmem supports Open MPI/SHMEM combination.

Now orte_info/ompi_info/oshmem_info have identical implementation approach.

Possible improvement:
OSHMEM processing of --config option is the same as OMPI`s (code is duplicated).
Probably list of info_support interfaces can be extended by xxx_info_do_config().
developed by Igor, reviewed by miked

This commit was SVN r29429.
2013-10-12 19:03:32 +00:00
Jeff Squyres
c065839d1f Make the Java wrapper compiler behave like the other wrapper compilers
(specifically, with regards to the --showme flag and return code).

This commit was SVN r29270.
2013-09-26 22:48:51 +00:00
Nathan Hjelm
c699ee7812 Update the ompi_info man page with information about variable levels
and improve the behavior of ompi_info.

This commit changes the default behavior of ompi_info --all when a
level is not specified. Instead of assuming level 1 in this case we
now assume level 9. This change is due to feedback from the community
after the introduction of the --level option.

I also added a new option: --selected-only. This option will limit the
displayed variables to components that can be selected (ie. if there
is a selection parameter set-- btl self,sm)

cmr=v1.7.3:reviewer=jsquyres

This commit was SVN r29070.
2013-08-27 19:11:37 +00:00
Ralph Castain
a200e4f865 As per the RFC, bring in the ORTE async progress code and the rewrite of OOB:
*** THIS RFC INCLUDES A MINOR CHANGE TO THE MPI-RTE INTERFACE ***

Note: during the course of this work, it was necessary to completely separate the MPI and RTE progress engines. There were multiple places in the MPI layer where ORTE_WAIT_FOR_COMPLETION was being used. A new OMPI_WAIT_FOR_COMPLETION macro was created (defined in ompi/mca/rte/rte.h) that simply cycles across opal_progress until the provided flag becomes false. Places where the MPI layer blocked waiting for RTE to complete an event have been modified to use this macro.

***************************************************************************************

I am reissuing this RFC because of the time that has passed since its original release. Since its initial release and review, I have debugged it further to ensure it fully supports tests like loop_spawn. It therefore seems ready for merge back to the trunk. Given its prior review, I have set the timeout for one week.

The code is in  https://bitbucket.org/rhc/ompi-oob2


WHAT:    Rewrite of ORTE OOB

WHY:       Support asynchronous progress and a host of other features

WHEN:    Wed, August 21

SYNOPSIS:
The current OOB has served us well, but a number of limitations have been identified over the years. Specifically:

* it is only progressed when called via opal_progress, which can lead to hangs or recursive calls into libevent (which is not supported by that code)

* we've had issues when multiple NICs are available as the code doesn't "shift" messages between transports - thus, all nodes had to be available via the same TCP interface.

* the OOB "unloads" incoming opal_buffer_t objects during the transmission, thus preventing use of OBJ_RETAIN in the code when repeatedly sending the same message to multiple recipients

* there is no failover mechanism across NICs - if the selected NIC (or its attached switch) fails, we are forced to abort

* only one transport (i.e., component) can be "active"


The revised OOB resolves these problems:

* async progress is used for all application processes, with the progress thread blocking in the event library

* each available TCP NIC is supported by its own TCP module. The ability to asynchronously progress each module independently is provided, but not enabled by default (a runtime MCA parameter turns it "on")

* multi-address TCP NICs (e.g., a NIC with both an IPv4 and IPv6 address, or with virtual interfaces) are supported - reachability is determined by comparing the contact info for a peer against all addresses within the range covered by the address/mask pairs for the NIC.

* a message that arrives on one TCP NIC is automatically shifted to whatever NIC that is connected to the next "hop" if that peer cannot be reached by the incoming NIC. If no TCP module will reach the peer, then the OOB attempts to send the message via all other available components - if none can reach the peer, then an "error" is reported back to the RML, which then calls the errmgr for instructions.

* opal_buffer_t now conforms to standard object rules re OBJ_RETAIN as we no longer "unload" the incoming object

* NIC failure is reported to the TCP component, which then tries to resend the message across any other available TCP NIC. If that doesn't work, then the message is given back to the OOB base to try using other components. If all that fails, then the error is reported to the RML, which reports to the errmgr for instructions

* obviously from the above, multiple OOB components (e.g., TCP and UD) can be active in parallel

* the matching code has been moved to the RML (and out of the OOB/TCP component) so it is independent of transport

* routing is done by the individual OOB modules (as opposed to the RML). Thus, both routed and non-routed transports can simultaneously be active

* all blocking send/recv APIs have been removed. Everything operates asynchronously.


KNOWN LIMITATIONS:

* although provision is made for component failover as described above, the code for doing so has not been fully implemented yet. At the moment, if all connections for a given peer fail, the errmgr is notified of a "lost connection", which by default results in termination of the job if it was a lifeline

* the IPv6 code is present and compiles, but is not complete. Since the current IPv6 support in the OOB doesn't work anyway, I don't consider this a blocker

* routing is performed at the individual module level, yet the active routed component is selected on a global basis. We probably should update that to reflect that different transports may need/choose to route in different ways

* obviously, not every error path has been tested nor necessarily covered

* determining abnormal termination is more challenging than in the old code as we now potentially have multiple ways of connecting to a process. Ideally, we would declare "connection failed" when *all* transports can no longer reach the process, but that requires some additional (possibly complex) code. For now, the code replicates the old behavior only somewhat modified - i.e., if a module sees its connection fail, it checks to see if it is a lifeline. If so, it notifies the errmgr that the lifeline is lost - otherwise, it notifies the errmgr that a non-lifeline connection was lost.

* reachability is determined solely on the basis of a shared subnet address/mask - more sophisticated algorithms (e.g., the one used in the tcp btl) are required to handle routing via gateways

* the RML needs to assign sequence numbers to each message on a per-peer basis. The receiving RML will then deliver messages in order, thus preventing out-of-order messaging in the case where messages travel across different transports or a message needs to be redirected/resent due to failure of a NIC

This commit was SVN r29058.
2013-08-22 16:37:40 +00:00
Jeff Squyres
4d9da92e60 Fixes trac:376: bu default the wrappr compilers will enable rpath support
in generated executables on systems that support it.  Use
--disable-wrapper-rpath to disable this behavior.  See text in
README about --disable-wrapper-rpath for more details.

This commit was SVN r28479.

The following Trac tickets were found above:
  Ticket 376 --> https://svn.open-mpi.org/trac/ompi/ticket/376
2013-05-11 00:49:17 +00:00
Ralph Castain
ae68a953f4 Sigh - one more place
This commit was SVN r28447.
2013-05-05 00:25:14 +00:00
Ralph Castain
5d7a93c032 Add the ability to use an external version of libevent. Clearly not recommended at this time. I've verified that it works in limited scenarios, but more thorough testing and performance impacts need to be assessed.
Interesting how many includes had to be fixed here and there to fill in missing dependencies :-)

This commit was SVN r28411.
2013-04-29 17:02:37 +00:00
Nathan Hjelm
c50b99005d fix typo in opal_info_show_component_version and clean up more from ompi_info
This commit was SVN r28389.
2013-04-24 22:07:06 +00:00
Nathan Hjelm
4896b3bc4b clean up some ompi_info code
This commit was SVN r28388.
2013-04-24 21:37:24 +00:00
Ralph Castain
30850f3280 Fix the @#$!@# ompi_info --version option. Via long chat with Jeff, simplify this option a lot by dumping all the silly suboptions it covered. Instead, just provide the very basic "ompi_info -V|--version" like all other tools do
cmr:v1.7.2

This commit was SVN r28381.
2013-04-24 17:46:28 +00:00
Nathan Hjelm
bccf8c657a Per RFC add initial support for the MPI 3.0 tools interface.
Current MPI_T support:
  - Full cvar interface.
  - Full categories interface.
  - No pvar support at this time.

This commit was SVN r28376.
2013-04-24 15:59:23 +00:00
Nathan Hjelm
538a4f92d3 make ompi_info print out mpi_ variables
This commit was SVN r28328.
2013-04-11 21:23:16 +00:00
Nathan Hjelm
9d4a26f47d Update OMPI frameworks to use the MCA framework system.
Notes:
  - This commit also eliminates the need for an available components list in use
    in several frameworks. None of the code in question was making use of the
    priority field of the priority component list item so these extra lists were
    removed.
  - Cleaned up selection code in several frameworks to sort lists using opal_list_sort.
  - Cleans up the ompi/orte-info functions. Expose the functions that construct the
    list of params so they can be used elsewhere.

patches for mtl/portals4 from brian

missed a few output variables in openib

This commit was SVN r28241.
2013-03-27 21:17:31 +00:00
Nathan Hjelm
cf377db823 MCA/base: Add new MCA variable system
Features:
 - Support for an override parameter file (openmpi-mca-param-override.conf).
   Variable values in this file can not be overridden by any file or environment
   value.
 - Support for boolean, unsigned, and unsigned long long variables.
 - Support for true/false values.
 - Support for enumerations on integer variables.
 - Support for MPIT scope, verbosity, and binding.
 - Support for command line source.
 - Support for setting variable source via the environment using
   OMPI_MCA_SOURCE_<var name>=source (either command or file:filename)
 - Cleaner API.
 - Support for variable groups (equivalent to MPIT categories).

Notes:
 - Variables must be created with a backing store (char **, int *, or bool *)
   that must live at least as long as the variable.
 - Creating a variable with the MCA_BASE_VAR_FLAG_SETTABLE enables the use of
   mca_base_var_set_value() to change the value.
 - String values are duplicated when the variable is registered. It is up to
   the caller to free the original value if necessary. The new value will be
   freed by the mca_base_var system and must not be freed by the user.
 - Variables with constant scope may not be settable.
 - Variable groups (and all associated variables) are deregistered when the
   component is closed or the component repository item is freed. This
   prevents a segmentation fault from accessing a variable after its component
   is unloaded.
 - After some discussion we decided we should remove the automatic registration
   of component priority variables. Few component actually made use of this
   feature.
 - The enumerator interface was updated to be general enough to handle
   future uses of the interface.
 - The code to generate ompi_info output has been moved into the MCA variable
   system. See mca_base_var_dump().

opal: update core and components to mca_base_var system
orte: update core and components to mca_base_var system
ompi: update core and components to mca_base_var system

This commit also modifies the rmaps framework. The following variables were
moved from ppr and lama: rmaps_base_pernode, rmaps_base_n_pernode,
rmaps_base_n_persocket. Both lama and ppr create synonyms for these variables.

This commit was SVN r28236.
2013-03-27 21:09:41 +00:00
Ralph Castain
a4b6fb241f Remove all remaining vestiges of the Windows integration
This commit was SVN r28137.
2013-02-28 17:31:47 +00:00
Brian Barrett
504a6d036f * Rather than use the extra_includes directive, add the extra includes (which is really just -I${includedir}/openmpi/ for devel headers) to CPPFLAGS, since all the other necessary -Is for devel headers (like libevent and hwloc) are added to CPPFLAGS.
* Clean up ${includedir} and ${libdir} for script wrapper compilers
* Update script wrapper compilers to work like the C wrapper compilers w.r.t static and dynamic linking
* Remove the ORTE script wrapper compilers since they didn't support the ${includedir} stuff and Ralph said they weren't used anymore.

This commit was SVN r28052.
2013-02-13 00:33:05 +00:00
Joshua Ladd
70ad711337 Backing out the Open SHMEM project
This commit was SVN r28050.
2013-02-12 17:45:27 +00:00
Mike Dubman
ff384daab4 Added new project: oshmem.
This commit was SVN r28048.
2013-02-12 15:33:21 +00:00
Brian Barrett
b8442ba505 Revamp the handling of wrapper compiler flags. The user flags, main configure
flags, and mca flags are kept seperate until the very end.  The main configure
wrapper flags should now be modified by using the OPAL_WRAPPER_FLAGS_ADD
macro.  MCA components should either let <framework>_<component>_{LIBS,LDFLAGS}
be copied over OR set <framework>_<component>_WRAPPER_EXTRA_{LIBS,LDFLAGS}.
The situations in which WRAPPER CPPFLAGS can be set by MCA components was
made very small to match the one use case where it makes sense.

This commit was SVN r27950.
2013-01-29 00:00:43 +00:00
Brian Barrett
f42783ae1a Move the RTE framework change into the trunk. With this change, all non-CR
runtime code goes through one of the rte, dpm, or pubsub frameworks.

This commit was SVN r27934.
2013-01-27 23:25:10 +00:00
Brian Barrett
05dc4431d9 Don't link orte into fortran in the dynamic case
This commit was SVN r27885.
2013-01-21 23:05:18 +00:00
Ralph Castain
b1925d35c7 Silence asprintf warnings in ompi_info, add libs to build under Ubuntu
This commit was SVN r27719.
2012-12-23 19:54:44 +00:00
Jeff Squyres
c5b0bcd9f7 Refs trac:3422
* Add some comments in the *-wrapper-data-txt.in files just so that
   someone doesn't forget in the future why we link in what we do in
   the MPI and ORTE wrapper compilers.
 * Update ompi_wrapper_script.in to match the new behavior.
 * Update orte_wrapper_script.in to support --openmpi:linkall (which
   is a no-op in this case)

This commit was SVN r27672.

The following Trac tickets were found above:
  Ticket 3422 --> https://svn.open-mpi.org/trac/ompi/ticket/3422
2012-12-14 16:34:20 +00:00
Jeff Squyres
f779b1ded9 Put back the static-library-detection stuff from r27668, with some
additional functionality.  Rationale (refs trac:3422):

 * Normal MPI applications only ever use the MPI API. Hence, -lmpi is
   sufficient (they'll never directly call ORTE or OPAL
   functions). This is arguably the most common case.
 * That being said, we do have some test programs (e.g., those in
   orte/test/mpi) that call MPI functions but also call ORTE/OPAL
   functions. I've also written the occasional MPI test program that
   calls opal_output, for example (there even might be a few tests in
   the IBM test suite that directly call ORTE/OPAL functions).
   * Even though this is not a common case, these applications should
     also compile/link with mpicc.
   * So we should add a --openmpi:linkall option that will also link
     in whatever is necessary to call ORTE/OPAL functions
   * Yes, we could hard-code "-lopen-rte -lopen-pal" in Makefiles, but
     we do reserve the right to change those library names and/or add
     others someday, so it's better to abstract out the names and let
     the wrapper supply whatever is necessary.
 * ORTE programs, however, are different. They almost always call OPAL
   functions (e.g., if they want to send a message, they must use the
   OPAL DSS). As such, it seems like the ORTE programs should always
   link in OPAL.

Therefore:

 * Add undocumented --openmpi:linkall flag to the wrapper compilers.
   See the comment in opal_wrapper.c for an explanation of what it
   does.  This flag is only intended for Open MPI developers -- not
   end users.  That's why it's undocumented.
 * Update orte/test/mpi/Makefile.am to add --openmpi:linkall
 * Make ortecc/ortec++'s wrapper data text files always explicitly
   link in libopen-pal

This commit was SVN r27670.

The following SVN revision numbers were found above:
  r27668 --> open-mpi/ompi@cf845897aa

The following Trac tickets were found above:
  Ticket 3422 --> https://svn.open-mpi.org/trac/ompi/ticket/3422
2012-12-13 22:31:37 +00:00
Jeff Squyres
cf845897aa Temporarily revert r27662 and r27667 because something wonky is
happening on OS X.  Grumble...

This commit was SVN r27668.

The following SVN revision numbers were found above:
  r27662 --> open-mpi/ompi@97cc916007
  r27667 --> open-mpi/ompi@529f6244ca
2012-12-11 23:08:14 +00:00
Jeff Squyres
97cc916007 Per discussion at the Open MPI developer meeting last week:
1. Restore libopen-pal.la, libopen-rte.la, and libmpi.la to be
    separate entities (i.e., don't have libopen-rte.la include
    libopen-pal.la, and don't have libmpi.la include libopen-pal.la).
    Yay!
 1. Consequently, make the wrapper compilers look for flags indicating
    that the user wants to compile statically (currently: -static,
    !--static, -Bstatic, and "-Wl," in front of all of those).  If it
    is, follow a 6-way matrix for determinining which libraries to
    list on the underlying command line.
 1. To support that, add the name of a token static and dynamic
    library to look for in each of the wrapper compiler data files.
 1. Fix a long-standing typo in the opalcc wrapper data file.

This commit was SVN r27662.
2012-12-11 01:46:59 +00:00
Nathan Hjelm
a427a7e727 do not include c99 flag in compiler wrappers
This commit was SVN r27625.
2012-11-20 19:33:14 +00:00
Ralph Castain
e0ff5d9c05 Output *all* of the thread settings
This commit was SVN r27623.
2012-11-18 18:10:19 +00:00
Nathan Hjelm
2acd0f83de Revert "Revert r27451 and r27456 - the cmd line parser is incorrectly marking the application as an MCA parameter".
It appears the problem was not with the command line parser but the rsh plm. I don't know why this problem was not occuring before the command line parser changes but it appears to be resolved now.

This commit was SVN r27527.

The following SVN revision numbers were found above:
  r27451 --> open-mpi/ompi@d59034e6ef
  r27456 --> open-mpi/ompi@ecdbf34937
2012-10-30 19:45:18 +00:00
Ralph Castain
e6014bf2e1 Revert r27451 and r27456 - the cmd line parser is incorrectly marking the application as an MCA parameter
This commit was SVN r27477.

The following SVN revision numbers were found above:
  r27451 --> open-mpi/ompi@d59034e6ef
  r27456 --> open-mpi/ompi@ecdbf34937
2012-10-24 18:38:44 +00:00
Nathan Hjelm
d59034e6ef MCA: remove deprecated mca_base_param functions (mca_base_param_register_int, mca_base_param_register_string, mca_base_param_environ_variable). Remove all uses of deprecated functions.
cmr:v1.7

This commit was SVN r27451.
2012-10-17 20:17:37 +00:00
Jeff Squyres
fb2e543a57 Refs trac:3275.
We ran into a case where the OMPI SVN trunk grew a new acceptable MCA
parameter value, but this new value was not accepted on the v1.6
branch (hwloc_base_mem_bind_failure_action -- on the trunk it accepts
the value "silent", but on the older v1.6 branch, it doesn't).  If you
set "hwloc_base_mem_bind_failure_action=silent" in the default MCA
params file and then accidentally ran with the v1.6 branch, every OMPI
executable (including ompi_info) just failed because hwloc_base_open()
would say "hey, 'silent' is not a valid value for
hwloc_base_mem_bind_failure_action!".  Kaboom.

The only problem is that it didn't give you any indication of where
this value was being set.  Quite maddening, from a user perspective.

So we changed the ompi_info handles this case.  If any framework open
function return OMPI_ERR_BAD_PARAM (either because its base MCA params
got a bad value or because one of its component register/open
functions return OMPI_ERR_BAD_PARAM), ompi_info will stop, print out
a warning that it received and error, and then dump out the parameters
that it has received so far in the framework that had a problem.

At a minimum, this will show the user the MCA param that had an error
(it's usually the last one), and ''where it was set from'' (so that
they can go fix it).  

We updated ompi_info to check for O???_ERR_BAD_PARAM from each from
the framework opens.  Also updated the doxygen docs in mca.h for this
O???_BAD_PARAM behavior.  And we noticed that mca.h had MCA_SUCCESS
and MCA_ERR_??? codes.  Why?  I think we used them in exactly one
place in the code base (mca_base_components_open.c).  So we deleted
those and just used the normal OPAL_* codes instead.

While we were doing this, we also cleaned up a little memory
management during ompi_info/orte-info/opal-info finalization.
Valgrind still reports a truckload of memory still in use at ompi_info
termination, but they mostly look to be components not freeing
memory/resources properly (and outside the scope of this fix).

This commit was SVN r27306.

The following Trac tickets were found above:
  Ticket 3275 --> https://svn.open-mpi.org/trac/ompi/ticket/3275
2012-09-11 20:47:24 +00:00
Jeff Squyres
7390ab8a23 Many updates and bug fixes for the Fortran bindings. Sorry these
aren't separated out into individual commits; they represent a few
months of work in the Mercurial branch, and it seemed error-prone to
try to break them up into multiple SVN commits.

 * Remove 2nd overloaded interfaces for MPI_TESTALL, MPI_TESTSOME,
   MPI_WAITALL, and MPI_WAITSOME in the "mpi" module implementations
   (because we're not allowed to have them, anyway -- it causes
   complications in the profiling interface).  This forced an MPI-2.2
   errata in the MPI Forum; we applied the errata here (the array of
   statuses parameter could not have a specific dimension specified in
   the dummy argument).  Fixes trac:3166.
 * Similarly, fix type for MPI_ARGVS_NULL in Fortran
 * Add MPI_3.0 function MPI_F_SYNC_REG (Fortran interfaces only).
 * Add MPI-3.0 MPI_MESSAGE_NO_PROC in the mpi_f08 module.
 * Added mpi_f08 handle comparison operators, per MPI-3.0 addendum to
   the F08 proposal at the last Forum meeting.  
 * Added missing type(MPI_File) and type(Message) in mpi_f08 module.
 * Fix --disable-mpi-io configure switch with all Fortran interfaces
 * Re-factor the Fortran header files to be fundamentally simpler and
   easier to maintain.  Fortran constant values in the header files
   are now generated by a script named mpif-values.pl during
   autogen.pl (they were previously generated by mpif-common.pl, but
   it was quite a bit more subtle/complex).  A second commit will
   follow this one to update svn:ignore values (just to ensure we
   don't muck up the first commit with the SVN client getting confused
   by the changed ignore values and new/changed files).
 * Fix some dependencies for compile ordering in
   ompi/mpi/fortran/use-mpi-ignore-tkr/Makefile.am. 
 * Fix bad wording in several places (.m4 file name, ompi_info output,
   etc.): we previoulsy said "F08 assumed shape" when we really meant
   "F08 assumed rank" (for Fortran gurus, those are very different
   things). 
 * Removed the GREEK/SVN version string from mpif.h.  It really had no
   purpose being there.

Still to be done:

 * Handling of 2D array of strings in MPI_COMM_SPAWN_MULTIPLE still
   isn't right yet.  Not sure how many people really care about this
   :-), but it is still broken.

This commit was SVN r26997.

The following Trac tickets were found above:
  Ticket 3166 --> https://svn.open-mpi.org/trac/ompi/ticket/3166
2012-08-10 21:19:47 +00:00
Shiqing Fan
42dfbc7d2f Another CMake scripts update for:
correctly generate hwloc library
automatically define OMPI/OPAL/ORTE_OMPORTS for user applications
update the f77 bindings

This commit was SVN r26893.
2012-07-27 11:49:09 +00:00
Shiqing Fan
5d81c27282 Update the CMake files for Fortran 77 bindings, get ready for F90 bindings.
Change several variable names and update the macros.

This commit was SVN r26851.
2012-07-24 08:49:34 +00:00
Ralph Castain
e335de3564 Refactor ompi_info, splitting it into parts according to the layer involved. Thus, we call down to the opal layer to get those frameworks and components, and down to the orte layer to get those. Still some abstraction breaks, but they mostly involve renaming of OMPI_foo labels that have been around since before we split the build system by layer.
This commit was SVN r26695.
2012-06-28 18:23:34 +00:00
Josh Hursey
28681deffa Backout the ORCA commit. :(
There is a linking issue on Mac OSX that needs to be addressed before this is able to come back into the trunk.

This commit was SVN r26676.
2012-06-27 01:28:28 +00:00
Josh Hursey
eac4913fc6 lorca to lopen-rca
This commit was SVN r26671.
2012-06-26 21:48:02 +00:00
Josh Hursey
542330e3a7 Commit of ORCA: Open MPI Runtime Collaborative Abstraction
This is a runtime interposition project that sits between the OMPI and ORTE layers in Open MPI.

The project is described on the wiki:
  https://svn.open-mpi.org/trac/ompi/wiki/Runtime_Interposition

And on this email thread:
  http://www.open-mpi.org/community/lists/devel/2012/06/11109.php

This commit was SVN r26670.
2012-06-26 21:42:16 +00:00
Edgar Gabriel
288d044097 get rid of the fcache framework. It was not being used as originally intended.
This commit was SVN r26668.
2012-06-26 19:53:26 +00:00
Ralph Castain
e6f3586415 Remove the orte notifier framework, per discussion at the devel meeting and follow-up with Jeff (who took the action item)
This commit was SVN r26637.
2012-06-22 18:09:23 +00:00
Brian Barrett
9af72072a3 Use MKDIR_P instead of mkdir_p in Makefiles, as MKDIR_P is the only one
defined in recent versions of AC/AM.

This commit was SVN r26625.
2012-06-21 16:52:37 +00:00
Ralph Castain
0a713cd27e Add database framework to ORTE and refactor modex code to utilize it. Create the "hash" db component from the prior modex db code. Leave the other components ignored for now - will activate them later.
Modex is still a blocking operation at this point.

This commit was SVN r26618.
2012-06-19 13:38:42 +00:00
Shiqing Fan
aa6cde9886 Change f77 to fortran for the rest of windows build files.
This commit was SVN r26558.
2012-06-06 14:09:51 +00:00
Terry Dontje
951131b58d add exit code to mpijavac dependent on the result of javac command
This commit was SVN r26515.
2012-05-29 18:44:08 +00:00
Jeff Squyres
2ba10c37fe Per RFC, bring in the following changes:
* Remove paffinity, maffinity, and carto frameworks -- they've been
   wholly replaced by hwloc.
 * Move ompi_mpi_init() affinity-setting/checking code down to ORTE.
 * Update sm, smcuda, wv, and openib components to no longer use carto.
   Instead, use hwloc data.  There are still optimizations possible in
   the sm/smcuda BTLs (i.e., making multiple mpools).  Also, the old
   carto-based code found out how many NUMA nodes were ''available''
   -- not how many were used ''in this job''.  The new hwloc-using
   code computes the same value -- it was not updated to calculate how
   many NUMA nodes are used ''by this job.''
   * Note that I cannot compile the smcuda and wv BTLs -- I ''think''
     they're right, but they need to be verified by their owners.
 * The openib component now does a bunch of stuff to figure out where
   "near" OpenFabrics devices are.  '''THIS IS A CHANGE IN DEFAULT
   BEHAVIOR!!''' and still needs to be verified by OpenFabrics vendors
   (I do not have a NUMA machine with an OpenFabrics device that is a
   non-uniform distance from multiple different NUMA nodes).
 * Completely rewrite the OMPI_Affinity_str() routine from the
   "affinity" mpiext extension.  This extension now understands
   hyperthreads; the output format of it has changed a bit to reflect
   this new information.
 * Bunches of minor changes around the code base to update names/types
   from maffinity/paffinity-based names to hwloc-based names.
 * Add some helper functions into the hwloc base, mainly having to do
   with the fact that we have the hwloc data reporting ''all''
   topology information, but sometimes you really only want the
   (online | available) data.

This commit was SVN r26391.
2012-05-07 14:52:54 +00:00
Jeff Squyres
a712fc4649 More Fortran-oriented typos / fixes in the script wrapper compiler
This commit was SVN r26357.
2012-04-27 21:20:24 +00:00
Jeff Squyres
fff1612c04 * Forgot to update ompi-fort.pc.in
* Remove unused f77 reference in the script wrapper compiler

This commit was SVN r26355.
2012-04-27 20:05:09 +00:00
Jeff Squyres
5efdfdfa09 * Make mpif77 and mpif90 be sym links to mpifort, just to drive the
point home that they're deprecated
 * Similarly, make mpif77-wrapper-data.txt be a sym link to
   mpifirt-wrapper-data.txt (ditto with mpif90)
 * Make new mpif77.1 and mpif90.1 man pages that say that they're
   deprecated; use mpifort instead

This commit was SVN r26353.
2012-04-27 19:35:23 +00:00
Jeff Squyres
94de69bd0c The F77 and F90 macros are no longer needed
This commit was SVN r26348.
2012-04-27 01:11:26 +00:00
Jeff Squyres
501a86afe1 No need to include the generated files in the tarball. Thanks to
Eugene for pointing this out.

This commit was SVN r26339.
2012-04-25 14:19:18 +00:00
Jeff Squyres
253444c6d0 == Highlights ==
1. New mpifort wrapper compiler: you can utilize mpif.h, use mpi, and use mpi_f08 through this one wrapper compiler
 1. mpif77 and mpif90 still exist, but are sym links to mpifort and may be removed in a future release
 1. The mpi module has been re-implemented and is significantly "mo' bettah"
 1. The mpi_f08 module offers many, many improvements over mpif.h and the mpi module

This stuff is coming from a VERY long-lived mercurial branch (3 years!); it'll almost certainly take a few SVN commits and a bunch of testing before I get it correctly committed to the SVN trunk.

== More details ==

Craig Rasmussen and I have been working with the MPI-3 Fortran WG and Fortran J3 committees for a long, long time to make a prototype MPI-3 Fortran bindings implementation.  We think we're at a stable enough state to bring this stuff back to the trunk, with the goal of including it in OMPI v1.7.  

Special thanks go out to everyone who has been incredibly patient and helpful to us in this journey:

 * Rolf Rabenseifner/HLRS (mastermind/genius behind the entire MPI-3 Fortran effort)
 * The Fortran J3 committee
 * Tobias Burnus/gfortran
 * Tony !Goetz/Absoft
 * Terry !Donte/Oracle
 * ...and probably others whom I'm forgetting :-(

There's still opportunities for optimization in the mpi_f08 implementation, but by and large, it is as far along as it can be until Fortran compilers start implementing the new F08 dimension(..) syntax.

Note that gfortran is currently unsupported for the mpi_f08 module and the new mpi module.  gfortran users will a) fall back to the same mpi module implementation that is in OMPI v1.5.x, and b) not get the new mpi_f08 module.  The gfortran maintainers are actively working hard to add the necessary features to support both the new mpi_f08 module and the new mpi module implementations.  This will take some time.

As mentioned above, ompi/mpi/f77 and ompi/mpi/f90 no longer exist.  All the fortran bindings implementations have been collated under ompi/mpi/fortran; each implementation has its own subdirectory:

{{{
ompi/mpi/fortran/
  base/               - glue code
  mpif-h/             - what used to be ompi/mpi/f77
  use-mpi-tkr/        - what used to be ompi/mpi/f90
  use-mpi-ignore-tkr/ - new mpi module implementation
  use-mpi-f08/        - new mpi_f08 module implementation
}}}

There's also a prototype 6-function-MPI implementation under use-mpi-f08-desc that emulates the new F08 dimension(..) syntax that isn't fully available in Fortran compilers yet.  We did that to prove it to ourselves that it could be done once the compilers fully support it.  This directory/implementation will likely eventually replace the use-mpi-f08 version.

Other things that were done:

 * ompi_info grew a few new output fields to describe what level of Fortran support is included
 * Existing Fortran examples in examples/ were renamed; new mpi_f08 examples were added
 * The old Fortran MPI libraries were renamed:
   * libmpi_f77 -> libmpi_mpifh
   * libmpi_f90 -> libmpi_usempi
 * The configury for Fortran was consolidated and significantly slimmed down.  Note that the F77 env variable is now IGNORED for configure; you should only use FC. Example:
{{{
shell$ ./configure CC=icc CXX=icpc FC=ifort ...
}}}

All of this work was done in a Mercurial branch off the SVN trunk, and hosted at Bitbucket.  This branch has got to be one of OMPI's longest-running branches.  Its first commit was Tue Apr 07 23:01:46 2009 -0400 -- it's over 3 years old!  :-)  We think we've pulled in all relevant changes from the OMPI trunk (e.g., Fortran implementations of the new MPI-3 MPROBE stuff for mpif.h, use mpi, and use mpi_f08, and the recent Fujitsu Fortran patches).

I anticipate some instability when we bring this stuff into the trunk, simply because it touches a LOT of code in the MPI layer in the OMPI code base.  We'll try our best to make it as pain-free as possible, but please bear with us when it is committed.

This commit was SVN r26283.
2012-04-18 15:57:29 +00:00
Ralph Castain
bd8b4f7f1e Sorry for mid-day commit, but I had promised on the call to do this upon my return.
Roll in the ORTE state machine. Remove last traces of opal_sos. Remove UTK epoch code.

Please see the various emails about the state machine change for details. I'll send something out later with more info on the new arch.

This commit was SVN r26242.
2012-04-06 14:23:13 +00:00
Jeff Squyres
97b3603036 A bunch of fixes and improvements to Open MPI's various command line tools.
* fixed some bugs where "unknown" tokens were allowed on the command
   line (which should really only be used for ortertun).
 * if an unknown token is encountered, print a short error to stderr
   and quit with a nonzero exit status
 * if we don't find the right number of parameters to an option, print
   a short error to stderr and quit with a nonzero exit status
 * when --help is given, print the help message to stdout (not stderr)
   and quit with a zero exit status
 * added --showme:help option to the wrapper compilers
 * updated docs in opal/util/cmd_line.h
 * other small/miscellaneous CLI parsing bugs in various tools

I won't bore you with what we did before.  :-)  Here's some examples
of what the new behavior looks like:

{{{
% ompi_info --bogus
ompi_info: Error: unknown option "--bogus"
Type 'ompi_info --help' for usage.
% ompi_info --param bogus
ompi_info: Error: option "--param" did not have enough parameters (2)
Type 'ompi_info --help' for usage.
%
}}}

This commit was SVN r26072.
2012-02-29 17:52:38 +00:00
Brian Barrett
aff49e98b4 Don't remove mpijavac.1 during distclean
This commit was SVN r26038.
2012-02-23 22:57:51 +00:00
Ralph Castain
0ad3f14281 Uninstall the java wrapper compiler
This commit was SVN r25979.
2012-02-21 02:54:01 +00:00
Ralph Castain
ae1f248a9a Ensure the man page gets into the tarball
This commit was SVN r25975.
2012-02-20 23:40:36 +00:00
Ralph Castain
47c64ec837 Roll in Java bindings per telecon discussion. Man pages still under revision
This commit was SVN r25973.
2012-02-20 22:12:43 +00:00
Jeff Squyres
1cab7579d4 Do not use $(RM); BSD-flavored "make"s don't understand it. Instead,
just use "rm" -- everyone has that.  Thanks to Paul Hargrove for
identifying the issue.

This commit was SVN r25910.
2012-02-13 22:13:38 +00:00
Matthias Jurenz
d2e180ffa0 Added field for VampirTrace support indicating whether it's enabled or not
This commit was SVN r25805.
2012-01-27 14:11:33 +00:00
Ralph Castain
bf103de66c My apologies for doing this outside of the usual time restrictions, but we need to get this in so we can make progress.
Move the ORTE-level debugger code back into orterun and out of the ORTE library to resolve symbol conflicts.

This commit was SVN r25713.
2012-01-11 15:53:09 +00:00
Ralph Castain
9b59d8de6f This is actually a much smaller commit than it appears at first glance - it just touches a lot of files. The --without-rte-support configuration option has never really been implemented completely. The option caused various objects not to be defined and conditionally compiled some base functions, but did nothing to prevent build of the component libraries. Unfortunately, since many of those components use objects covered by the option, it caused builds to break if those components were allowed to build.
Brian dealt with this in the past by creating platform files and using "no-build" to block the components. This was clunky, but acceptable when only one organization was using that option. However, that number has now expanded to at least two more locations.

Accordingly, make --without-rte-support actually work by adding appropriate configury to prevent components from building when they shouldn't. While doing so, remove two frameworks (db and rmcast) that are no longer used as ORCM comes to a close (besides, they belonged in ORCM now anyway). Do some minor cleanups along the way.

This commit was SVN r25497.
2011-11-22 21:24:35 +00:00
Jeff Squyres
38451d4972 Add the MPI API version to the ompi_info output. How did we never
have this in there before?

This commit was SVN r25437.
2011-11-04 23:30:59 +00:00
Jeff Squyres
1d6d39d2ea Missed this free/re-strdup
This commit was SVN r25426.
2011-11-03 11:31:37 +00:00
Jeff Squyres
6139256e45 v may get incremented, so be sure to save the ''original'' strdup'ed
pointer and free ''that'' -- not the (possibly incremented) pointer

This commit was SVN r25425.
2011-11-03 11:23:17 +00:00
Jeff Squyres
12d4280d0b Fix a bunch of memory leaks
This commit was SVN r25411.
2011-11-01 20:22:49 +00:00
George Bosilca
80c02647c8 Each level (OPAL/ORTE/OMPI) should only return it's own constants,
instead of the current mismatch.

This commit was SVN r25230.
2011-10-04 14:50:31 +00:00
Jeff Squyres
ecd603256a * Rename opal_hwloc_components to opal_hwloc_base_components
* Fix some comments

This commit was SVN r25150.
2011-09-17 11:54:36 +00:00
Jeff Squyres
f0b87bc0bd Users will have no clue what "hwloc" is -- call it "host topology
support" 

This commit was SVN r25143.
2011-09-14 19:22:08 +00:00
Jeff Squyres
dc806ee900 This macro isn't used anywhere anymore
This commit was SVN r25142.
2011-09-14 19:21:46 +00:00
Jeff Squyres
4771c36061 * With some m4 trickery, if no form of --with-hwloc is specified on
the command line, hwloc is just like any other external dependency
   in OMPI: if we find it, we'll use it. If we don't find it, we'll
   ignore it.  See comments in opal/mca/hwloc/configure.m4 for an
   explanation. 
 * Fix some copy-n-paste errors in opal/mca/hwloc/configure.m4
   w.r.t. flags coming in from the winning component.
 * Add another line in ompi_info's output about whether hwloc support
   is included or not.

This commit was SVN r25134.
2011-09-13 00:39:14 +00:00
Ralph Castain
92c7372e20 Per the RFC from Jeff, move hwloc from opal/mca/common to its own static framework ala libevent. Have ORTE daemons collect the topology info at startup and, if --enable-hwloc-xml is set, send that info back to the HNP for later use. The HNP only retains unique topology "templates" to reduce memory footprint. Have the daemon include the local topology info in the nidmap buffer sent to each app so the apps don't all hammer the local system to discover it for themselves.
Remove the sysinfo framework as hwloc replaces that functionality.

This commit was SVN r25124.
2011-09-11 19:02:24 +00:00
Edgar Gabriel
53725af281 add the new ompio frameworks to show up in the ompi_info output.
This commit was SVN r25119.
2011-09-07 14:23:28 +00:00
Shiqing Fan
20ee92c16e Make the compiler wrappers work correctly for MinGW build.
This commit was SVN r25051.
2011-08-16 12:32:41 +00:00
Shiqing Fan
3af7c9f7bb Complete the MinGW build support on Windows.
This commit was SVN r25048.
2011-08-15 09:47:23 +00:00
Samuel Gutierrez
81f38b258a commit of new shared memory backing facility framework (shmem) and its components.
This commit was SVN r24795.
2011-06-21 15:41:57 +00:00
Jeff Squyres
595dd60546 Fix off-by-one error in trimming space from the right of strings
This commit was SVN r24724.
2011-05-22 12:21:14 +00:00
Ralph Castain
ca5af216b6 Remove last vestige or orte-iof
This commit was SVN r24721.
2011-05-21 14:28:43 +00:00
Ralph Castain
8c08ee9c3d Remove stale tool
This commit was SVN r24720.
2011-05-21 00:38:35 +00:00
Ralph Castain
d34bab541d Remove the ompi-profiler tool and its attendant ompi-probe program. Also remove the grpcomm basic component since its only function was to support profiled clusters, which nobody was doing. :-(
This commit was SVN r24704.
2011-05-17 03:30:25 +00:00
Eugene Loh
2770a12beb Continue clean up of thread options started in r22841, 22842, and 22849.
No need for any CMRs to 1.5... that was already done in CMR 2728.

This commit was SVN r24545.

The following SVN revision numbers were found above:
  r22841 --> open-mpi/ompi@b400b84162
2011-03-18 21:36:35 +00:00
Jeff Squyres
3f4d4886f2 Minor update for something that has been bugging me for quite a while:
OMPI supports multiple different repository systems (SVN, hg, git).
But the VERSION file has listed "want_svn" and "svn_r" as fields, even
though the actual repo system and version may not be SVN.

So search/replace those fields (and derrivative values that come from
those fields) with "want_repo_rev" and "repo_rev", respectively.

This commit was SVN r24405.
2011-02-16 22:53:23 +00:00
Shiqing Fan
f43862420c Convert the bad dos line endings to unix style for all windows related files.
This commit was SVN r24137.
2010-12-02 12:08:08 +00:00
Shiqing Fan
828404bdad Revert r24123, it's not for the trunk.
This commit was SVN r24125.

The following SVN revision numbers were found above:
  r24123 --> open-mpi/ompi@71cb950ede
2010-12-01 11:52:09 +00:00
Shiqing Fan
71cb950ede Remove redundant linking options for mpicxx and mpic++ on Windows.
This commit was SVN r24123.
2010-12-01 11:25:44 +00:00
Shiqing Fan
358b4a5cba Add an option to enable the debug postfix for executables.
This commit was SVN r24070.
2010-11-19 15:54:13 +00:00
Jeff Squyres
e4744b4ed5 Per http://www.open-mpi.org/community/lists/devel/2010/11/8671.php,
change a bunch of OMPI_<foo> names to OPAL_<foo>.

This commit was SVN r24046.
2010-11-12 23:22:11 +00:00
Shiqing Fan
c03ea1a5f3 A more clean way to build on Windows.
It's not possible to combine two shared libraries on Windows, so we have to do it a bit different. First generate a small event static library by just linking the object files, and link it into other libraries that needs the libevent API.

This commit was SVN r24039.
2010-11-11 12:02:54 +00:00
Shiqing Fan
7bac326920 Fix Windows build, add custom command to generate static libraries (opal and orte) for shared build.
This commit was SVN r24012.
2010-11-09 08:32:45 +00:00
Ralph Castain
9ea2b196ce Convert the opal_event framework to use direct function calls instead of hiding functions behind function pointers. Eliminate the opal_object_t abstraction of libevent's event struct so it can be directly passed to the libevent functions.
Note: the ompi_check_libfca.m4 file had to be modified to avoid it stomping on global CPPFLAGS and the like. The file was also relocated to the ompi/config directory as it pertains solely to an ompi-layer component.

Forgive the mid-day configure change, but I know Shiqing is working the windows issues and don't want to cause him unnecessary redo work.

This commit was SVN r23966.
2010-10-28 15:22:46 +00:00
Ralph Castain
810dc9ad78 Fix typo
This commit was SVN r23955.
2010-10-26 19:53:17 +00:00
Ralph Castain
86c7365e8e Clean up a few initialization issues - don't think these are impacting the shared memory situation as it didn't fix the problem.
Setup the event API to support multiple bases in preparation for splitting the OMPI and ORTE events. Holding here pending shared memory resolution.

This commit was SVN r23943.
2010-10-26 02:41:42 +00:00
Ralph Castain
fceabb2498 Update libevent to the 2.0 series, currently at 2.0.7rc. We will update to their final release when it becomes available. Currently known errors exist in unused portions of the libevent code. This revision passes the IBM test suite on a Linux machine and on a standalone Mac.
This is a fairly intrusive change, but outside of the moving of opal/event to opal/mca/event, the only changes involved (a) changing all calls to opal_event functions to reflect the new framework instead, and (b) ensuring that all opal_event_t objects are properly constructed since they are now true opal_objects.

Note: Shiqing has just returned from vacation and has not yet had a chance to complete the Windows integration. Thus, this commit almost certainly breaks Windows support on the trunk. However, I want this to have a chance to soak for as long as possible before I become less available a week from today (going to be at a class for 5 days, and thus will only be sparingly available) so we can find and fix any problems.

Biggest change is moving the libevent code from opal/event to a new opal/mca/event framework. This was done to make it much easier to update libevent in the future. New versions can be inserted as a new component and tested in parallel with the current version until validated, then we can remove the earlier version if we so choose. This is a statically built framework ala installdirs, so only one component will build at a time. There is no selection logic - the sole compiled component simply loads its function pointers into the opal_event struct.

I have gone thru the code base and converted all the libevent calls I could find. However, I cannot compile nor test every environment. It is therefore quite likely that errors remain in the system. Please keep an eye open for two things:

1. compile-time errors: these will be obvious as calls to the old functions (e.g., opal_evtimer_new) must be replaced by the new framework APIs (e.g., opal_event.evtimer_new)

2. run-time errors: these will likely show up as segfaults due to missing constructors on opal_event_t objects. It appears that it became a typical practice for people to "init" an opal_event_t by simply using memset to zero it out. This will no longer work - you must either OBJ_NEW or OBJ_CONSTRUCT an opal_event_t. I tried to catch these cases, but may have missed some. Believe me, you'll know when you hit it.

There is also the issue of the new libevent "no recursion" behavior. As I described on a recent email, we will have to discuss this and figure out what, if anything, we need to do.

This commit was SVN r23925.
2010-10-24 18:35:54 +00:00
Ralph Castain
6b59614bf1 Overlooked vprotocol framework
This commit was SVN r23917.
2010-10-22 03:21:10 +00:00
Jeff Squyres
4322a78e60 Update wrapper compiler scripts to search for perl during configure, per request from BSD maintainers.
This commit was SVN r23914.
2010-10-19 22:45:54 +00:00
Ralph Castain
94ccc84d85 Not sure why I chased this one down with Jeff as nobody really seems to care...
This commit was SVN r23820.
2010-09-30 18:33:08 +00:00
Ralph Castain
3631e4e936 Revert remaining svn kruft from r23764
This commit was SVN r23786.

The following SVN revision numbers were found above:
  r23764 --> open-mpi/ompi@40a2bfa238
2010-09-22 01:11:40 +00:00
Ralph Castain
40a2bfa238 WARNING: Work on the temp branch being merged here encountered problems with bugs in subversion. Considerable effort has gone into validating the branch. However, not all conditions can be checked, so users are cautioned that it may be advisable to not update from the trunk for a few days to allow MTT to identify platform-specific issues.
This merges the branch containing the revamped build system based around converting autogen from a bash script to a Perl program. Jeff has provided emails explaining the features contained in the change.

Please note that configure requirements on components HAVE CHANGED. For example. a configure.params file is no longer required in each component directory. See Jeff's emails for an explanation.

This commit was SVN r23764.
2010-09-17 23:04:06 +00:00
Ralph Castain
e96b5f486f Reorganize the opal interface code in opal/util/if.c per prior emails and telecon discussions. Move the interface discovery code into a framework so that configuration logic can separate it out (instead of the prior #if-#else confusion).
All interface APIs for accessing the info remain unchanged in opal/util/if.c.

This has been tested on Mac, Linux, and NetBSD. Nobody else seemed interested in testing it, so there may be some future problems revealed as people try it on other OSs.

This commit was SVN r23743.
2010-09-13 01:58:51 +00:00