1
1
Граф коммитов

3972 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
fc7c58ede6 Missed updating the topo base check to look for v2.0.0, causing all
topology-related MPI tests to fail.

This commit was SVN r19088.
2008-07-30 00:50:42 +00:00
Donald Kerr
2899f64146 moving vendor_id info to top of file
This commit was SVN r19087.
2008-07-29 22:33:17 +00:00
Donald Kerr
513225c9f3 add Sun Microsystems, Inc. vendor_id
This commit was SVN r19085.
2008-07-29 19:31:41 +00:00
George Bosilca
2d8cbc6ade Allow other BTL to work even if they collide with regard to the exclusivity. The problem was
that by decreasing the btl_inuse if there was already a registered BTL we basically reset
the changes for this new BTL to register it's progress function, even if it was supposed to
handle another peer.

This commit was SVN r19080.
2008-07-29 18:38:11 +00:00
Jeff Squyres
388c047579 Ensure to actually register the proper synonym index
This commit was SVN r19074.
2008-07-29 13:05:39 +00:00
Jeff Squyres
0af7ac53f2 Fixes trac:1392, #1400
* add "register" function to mca_base_component_t
   * converted coll:basic and paffinity:linux and paffinity:solaris to
     use this function
   * we'll convert the rest over time (I'll file a ticket once all
     this is committed)
 * add 32 bytes of "reserved" space to the end of mca_base_component_t
   and mca_base_component_data_2_0_0_t to make future upgrades
   [slightly] easier
   * new mca_base_component_t size: 196 bytes
   * new mca_base_component_data_2_0_0_t size: 36 bytes
 * MCA base version bumped to v2.0
   * '''We now refuse to load components that are not MCA v2.0.x'''
 * all MCA frameworks versions bumped to v2.0
 * be a little more explicit about version numbers in the MCA base
   * add big comment in mca.h about versioning philosophy

This commit was SVN r19073.

The following Trac tickets were found above:
  Ticket 1392 --> https://svn.open-mpi.org/trac/ompi/ticket/1392
2008-07-28 22:40:57 +00:00
Jeff Squyres
a7c79558ad Remove an errant printf that was causing non-fatal errors to be
displayed (e.g., when deleting files that should have been silently
ignored if they didn't exist).

This commit was SVN r19071.
2008-07-28 22:11:31 +00:00
Jeff Squyres
c93f1d8c5e Fixes trac:1419: got a patch from upstream to fix the "OS X doesn't
have/need lseek64" issue.  This fix is also included in MPICH2 after
v1.0.7.

This commit was SVN r19070.

The following Trac tickets were found above:
  Ticket 1419 --> https://svn.open-mpi.org/trac/ompi/ticket/1419
2008-07-28 17:02:53 +00:00
Ralph Castain
a0ae63f19e Ensure we call close_port after comm_spawn[_multiple]. Cleanout the port name in close_port
This commit was SVN r19068.
2008-07-28 16:40:11 +00:00
Lenny Verkhovsky
f278609d77 Funny warning message about rd_win size fix
This commit was SVN r19063.
2008-07-28 14:30:57 +00:00
Adrian Knoth
5096512c3a Cosmetics, only typos.
This commit was SVN r19061.
2008-07-28 13:33:08 +00:00
Jeff Squyres
477cdb0b62 Had to abondon the first approach from r19040: it caused problems with
"make distclean".  It's not clear whether it's an Automake bug or
whether what I did simply is not supported (I've got pending mail into
Ralf W. asking about it).  The short version is that during "make
distclean", ompi/mpi/f77/Makefile would rm -rf ompi/mpi/f77/.deps.
But ompi/Makefile still include's some .Plo files from that directory,
so Bad Things happened when "make distclean" unrolled from the
ompi/mpi/f77 dir back up to the ompi/ dir.

So I went with George's original suggestion and moved the f77 "base"
files in question into a new directory: ompi/mpi/f77/base and put a
Makefile.include in there.  That way, this directory is not traversed
twice by distclean, and .deps is only removed when it is supposed to
be.  Maybe we'll be able to do it a little better someday, but that's
the way it is now.

I'll check this with a fresh checkout once this is committed to SVN as
well; some of these kinds of problems don't show up until you do a
build from a completely fresh SVN checkout.

This commit was SVN r19054.

The following SVN revision numbers were found above:
  r19040 --> open-mpi/ompi@9f4d4c4312
2008-07-26 20:38:30 +00:00
Jeff Squyres
505ffc6719 This file was never used
This commit was SVN r19053.
2008-07-26 19:27:18 +00:00
Jeff Squyres
78b8bac900 First part of "make distcheck" fix, but at least one more commit will
be required for a total fix.  Still looking into the problem...

This commit was SVN r19049.
2008-07-26 13:25:11 +00:00
Jeff Squyres
7c4d46a8d9 Grrr: I ''did'' remove these files on the initial commit of the new
SVN version (r19045), but I also edited the svn:ignore to ignore these
files in the same SVN commit -- I suspect that SVN got confused and
did not actually delete them.

This commit was SVN r19048.

The following SVN revision numbers were found above:
  r19045 --> open-mpi/ompi@63b63d48c3
2008-07-26 13:00:24 +00:00
Jeff Squyres
6dac4706ea Somehow this file got missed in the SVN import.
This commit was SVN r19047.
2008-07-26 12:54:15 +00:00
Jeff Squyres
63b63d48c3 Fixes trac:1370, #1147
Update the version of ROMIO to that which was contained in
MPICH2-1.0.7, plus a few patches from the upstream ROMIO maintainers
(because OMPI uses a few code paths in ROMIO that MPICH2 does not;
there were a few compile bugs in the ROMIO from MPICH2-1.0.7).

Added an info MCA param to be able to tell which version of ROMIO is
contained in OMPI: io_romio_version.

Many, many thanks to romio-maint@mcs.anl.gov for all their help in
integrating this new version of ROMIO into Open MPI.

This commit was SVN r19045.

The following Trac tickets were found above:
  Ticket 1370 --> https://svn.open-mpi.org/trac/ompi/ticket/1370
2008-07-26 12:23:30 +00:00
Jeff Squyres
9f4d4c4312 Fixes trac:1409: ensure that the C++, F77, and F90 bindings libraries
are properly linked against libmpi.la.

This required a little creative AM usage, inspired by discussion on
OMPI devel list:

 * Make a new ompi/mpi/f77/Makefile_f77base.include; effectively move
   the building of the f77 "base" glue stuff (libmpi_f77base.la) into
   this Makefile and away from ompi/mpi/f77/Makefile.am.  The sources
   in question require some specific CPPFLAGS, so we couldn't just add
   the raw sources into libmpi_la_SOURCES, unfortunately.
 * Include this new Makefile in the top-level ompi/Makefile.am
 * The libmpi_f77base.la LT convenience library was already sucked
   into libmpi.la; breaking it out into its own Makefile allows us
   to build it earlier and therefore complete buidling libmpi.la
   earlier.
 * Side effect: the ompi/mpi/Makefile.am is now mostly unnecessary; it
   no longer specifies a SUBDIRS for each of the bindings directories
   to traverse into (since they are now in the top-level SUBDIRS).  As
   such, the man pages are now also now included in the top-level
   ompi/Makefile.am.

The end of the result is that libmpi.la -- including a few sources
from mpi/f77 -- is fully built before the C++, F77, and F90 bindings
are built.  Therefore, the C++, F77, and F90 bindings libraries can
all link against libmpi.la.

This commit was SVN r19040.

The following Trac tickets were found above:
  Ticket 1409 --> https://svn.open-mpi.org/trac/ompi/ticket/1409
2008-07-25 21:18:05 +00:00
Jeff Squyres
e3e79c0881 Fixes trac:1379:
* Use synonym/deprecated MCA param API for some mca base params
 * In openib BTL, if we have appropriate memory hooks support, and if
   mpi_leave_pinned and mpi_leave_pinned_pipeline were not set by the
   user, set mpi_leave_pinned to 1.
 * Defer checking mpi_leave_pinned_* until as late as possible (i.e.,
   until after the btl's have had a chance to set mpi_leave_pinned to
   1):
   * in ob1 pml
   * in rdma mpool

This commit was SVN r19022.

The following Trac tickets were found above:
  Ticket 1379 --> https://svn.open-mpi.org/trac/ompi/ticket/1379
2008-07-24 22:51:26 +00:00
George Bosilca
6c21851160 Only register the BTL progress function if there is a need for it. This require
a little bit more than "BTL was able to add some procs". The real condition to
allow the BTL progress is that we will use it to send/recv data to/from some
of the peers (this include the BTL exclusivity in the process).

This commit was SVN r19010.
2008-07-24 10:33:17 +00:00
Jeff Squyres
5b9219565c Remove the use of __cpu_to_be64() and replace it with hton64().
This commit was SVN r18995.
2008-07-23 12:08:55 +00:00
Ralph Castain
83e7c19d33 Remove deprecated function - this was incorporated into the paffinity framework a long time ago. Fortunately, nobody was actually using it!
This commit was SVN r18990.
2008-07-23 03:43:31 +00:00
Jeff Squyres
2f208f885c Fixes trac:1295: change language in openib BTL from IB-specific to be
"!OpenFabrics" / neutral (i.e., refer to IB and/or iWARP).

 * Mostly just type, variable/field, and funcion name changes, such as
   s/hca/device/g, etc.  
 * Changed the INI file for the hardware-specific parameters to be
   mca-btl-openib-device-params.ini.
 * Updated a lot of help messages in the help-*.txt files, not just to
   update it to be !OpenFabrics/neutral language, but also for some
   consistency of tone, indenting, etc.
 * Deprecated a bunch of MCA params in favor of language-neutral new
   ones:
   * btl_openib_warn_no_hca_params_found (s/hca/device/)
   * btl_openib_hca_param_files
   * btl_openib_ib_cq_size (s/_ib_/_of_/)
   * btl_openib_ib_max_inline_data
   * btl_openib_ib_psn
   * btl_openib_ib_mtu
   * btl_openib_ib_pkey_ix
   * btl_openib_ib_pkey_val

This commit was SVN r18985.

The following Trac tickets were found above:
  Ticket 1295 --> https://svn.open-mpi.org/trac/ompi/ticket/1295
2008-07-23 00:28:59 +00:00
Aurelien Bouteiller
086cb6190e Use the generic version number instead of hardcoded ones
This commit was SVN r18983.
2008-07-22 21:10:51 +00:00
Jon Mason
f80404d991 Add openib error handling during wireup for rdmacm
The rdmacm event handler has no way of reporting fatal errors to the upper
layers.  By calling mca_btl_openib_endpoint_invoke_error in the rdmacm event
handler for the errors encountered, these errors can now be handled
appropriately.

Closes out Ticket #1283

This commit was SVN r18980.
2008-07-22 19:03:13 +00:00
Jeff Squyres
d37a25a2d0 Remove per http://www.open-mpi.org/community/lists/devel/2008/07/4386.php
This commit was SVN r18972.
2008-07-22 00:57:23 +00:00
Jeff Squyres
25bcf0f1d3 Oops -- cut-n-paste-error -- use OMPI in the OMPI layer...
This commit was SVN r18966.
2008-07-21 20:07:37 +00:00
Jeff Squyres
54dbd95243 Fix some component version numbers to be the same as the OMPI release
This commit was SVN r18965.
2008-07-21 20:05:29 +00:00
Ralph Castain
3137ed9255 Update the manpages for comm_spawn(_multiple) - add man page to explain host/hostfile behavior
This commit was SVN r18961.
2008-07-21 17:58:12 +00:00
Pavel Shamis
849a8f86a7 Bug fox for #1388 - fixing ib_cm_listen() random failures.
This commit was SVN r18952.
2008-07-20 06:21:32 +00:00
Jeff Squyres
e28f71f6e6 Have ompi_info also output where the value was set from (default,
environment, file, or API override).

Refs trac:1397

This commit was SVN r18943.

The following Trac tickets were found above:
  Ticket 1397 --> https://svn.open-mpi.org/trac/ompi/ticket/1397
2008-07-18 10:52:21 +00:00
Rolf vandeVaart
9c080b27d6 Fix for bug when running 64-bit heterogeneous.
This commit fixes trac:1341.

This commit was SVN r18940.

The following Trac tickets were found above:
  Ticket 1341 --> https://svn.open-mpi.org/trac/ompi/ticket/1341
2008-07-17 19:04:40 +00:00
George Bosilca
3ba0a8c0c1 In the case where the environment is homogeneous we can ALWAYS create
the receiver convertor when we create the request (as we know all
architectures are identical).

This commit was SVN r18934.
2008-07-17 04:57:55 +00:00
George Bosilca
902a2892b6 Fix typo.
This commit was SVN r18933.
2008-07-17 04:55:23 +00:00
George Bosilca
cb66115512 Add more optimizations in the case where heterogeneous support
is not enabled.

This commit was SVN r18932.
2008-07-17 04:54:47 +00:00
George Bosilca
939fa3001d Small cleanups. Remove some switch cases that cannot be reached. Rename
a struct field.

This commit was SVN r18931.
2008-07-17 04:50:39 +00:00
George Bosilca
319a8b3219 Once matched the proc attached to the request should be the source
of the message and not the first on the list. This fix the ticket
#1386.

This commit was SVN r18929.
2008-07-17 03:04:28 +00:00
Andrew Friedley
d479d981a7 Missed this on my last commit regarding if include/exclude -- should now iterate from 0, not 1.
This commit was SVN r18924.
2008-07-16 14:15:25 +00:00
Aurelien Bouteiller
66463cb258 Fix the annoying message from showing up when not using PML V. The underlying bug is not fixed though, but at least people not involved in FT dev should not see it anymore.
Fix ticket https://svn.open-mpi.org/trac/ompi/ticket/1328

Aurelien

This commit was SVN r18917.
2008-07-15 22:05:40 +00:00
Andrew Friedley
dabe6defb3 Add support for btl_ofud_{in,ex}clude MCA parameters.
This commit was SVN r18916.
2008-07-15 17:57:52 +00:00
Jeff Squyres
fd64e35f02 Fix a typo in the filename
This commit was SVN r18914.
2008-07-15 11:49:40 +00:00
Pavel Shamis
e233c11df0 Disabling IBCM cpc by default. In order to enable it, you need to define it explicitly in command line.
This commit was SVN r18911.
2008-07-15 08:02:00 +00:00
Pavel Shamis
807c53c7b1 "failed to ib_cm_listen 10 times" - is know bug in IBCM and we should print it
only if IBCM was explicitly requested.

This commit was SVN r18897.
2008-07-13 16:36:41 +00:00
Pavel Shamis
12379e7f3e Fixing race condition between main thread and async event thread
during openib finalization.

This commit was SVN r18895.
2008-07-13 16:21:49 +00:00
Ralph Castain
0abf5225a8 Update the man pages for MPI_Publish, MPI_Unpublish, MPI_Lookup, and MPI_Comm_spawn to cover scoping rules and new info keys
This commit was SVN r18891.
2008-07-11 19:09:50 +00:00
Ralph Castain
7834201f69 Silence unused var warning
This commit was SVN r18888.
2008-07-11 15:39:59 +00:00
Jeff Squyres
583bf425c0 Fixes trac:1383:
Short version: remove opal_paffinity_alone and restore
mpi_paffinity_alone.  ORTE makes various information available for the
MPI layer to decide what it wants to do in terms of processor
affinity.

Details:

 * remove opal_paffinity_alone MCA param; restore mpi_paffinity_alone
   MCA param
 * move opal_paffinity_slot_list param registration to paffinity base
 * ompi_mpi_init() calls opal_paffinity_base_slot_list_set(); if that
   succeeds use that.  If no slot list was set, see if
   mpi_paffinity_alone was set.  If so, bind this process to its Node
   Local Rank (NLR).  The NLR is the ORTE-maintained slot ID; if you
   COMM_SPAWN to a host in this ORTE universe that already has procs
   on it, the NLR for the new job will start at N (not 0).  So this is
   slightly better than mpi_paffinity_alone in the v1.2 series.
 * If a slot list is specified *and* mpi_paffinity_alone is set, we
   display an error and abort.
 * Remove calls from rmaps/rank_file component to register and lookup
   opal_paffinity mca params. 
 * Remove code in orte/odls that set affinities - instead, have them
   just pass a slot_list if it exists. 
 * Cleanup the orte/odls code that determined
   oversubscribed/want_processor as these were just opposites of each
   other.

This commit was SVN r18874.

The following Trac tickets were found above:
  Ticket 1383 --> https://svn.open-mpi.org/trac/ompi/ticket/1383
2008-07-10 21:12:45 +00:00
Pavel Shamis
a34bb98f8a Bug fix for #1376.
If IBCM was explicitly specified with exclude/include parameter,
OpenIB BTL will enable verbose report for "/dev/infiniband/ucm" error,
other way the error will not be reported.

This commit was SVN r18868.
2008-07-10 15:08:49 +00:00
Ralph Castain
0532aeb368 Truly minor memory leak fixes that -might- be picked up by valgrind if someone does "ompi_info -h" or ompi_info encounters a bizarre error
This commit was SVN r18865.
2008-07-10 11:56:41 +00:00
Jeff Squyres
49be4b1e45 Fixes trac:1383
Lenny and I went back and forth on whether we should simply register
another "mpi_paffinity_alone" MCA param and then try to figure out
which one was set in ompi_mpi_init, but there was difficulty in
figuring out what to do.  So it seemed like the Right Thing to do was
to implement what was committed in r18770; then we could tell where
MCA parameters were set from and you could do Better Things (this is
also useful in the openib BTL, where parameters can be set either via
MCA parameter or via an INI file).

But after that was done, it seemed only a few steps further to
actually implement two new features in the MCA params area:

 * Synonyms (where one MCA param name is a synonym for another)
 * Allow MCA params and/or their synonyms to be marked as "deprecated"
   (printing out warnings if they are used)

These features have actually long been discussed/desired, and I had
some time in airports and airplanes recently where I could work in
this stuff on a standalone laptop.  So I did it.  :-)

This commit introduces these two new features, and then uses them to
register mpi_paffinity_alone as a non-deprecated synonym for
opal_paffinity_alone.  A few other random points in this commit:

 * Add a few error checks for conditions that were not checked before
 * Correct some comments in mca_base_params.h
 * Add a few comments in strategic places
 * ompi_info now prints additional information:
   * for any MCA parameter that has synonyms, it lists all the
     synonyms
   * synonyms are also output as 1st-class MCA params, but with an
     additional attribute indicating that they have a "parent"
   * all MCA param name (both "real" or "synonym") will output an
     attribute indicating whether it is deprecated or not.  A synonym
     is deprecated if it iself is marked as deprecated (via the
     mca_base_param_regist_syn() or mca_base_param_register_syn_name()
     functions) or if its "parent" MCA parameter is deprecated

This commit was SVN r18859.

The following SVN revision numbers were found above:
  r18770 --> open-mpi/ompi@8efe67e08c

The following Trac tickets were found above:
  Ticket 1383 --> https://svn.open-mpi.org/trac/ompi/ticket/1383
2008-07-10 01:44:51 +00:00
Aurelien Bouteiller
4d080e1c13 Fix a bug where the list sentinel was added instead of the new element to the proc_list.
This commit was SVN r18854.
2008-07-09 20:45:24 +00:00
George Bosilca
3de0488410 Fix the truncation problem. This close the #211.
This commit was SVN r18850.
2008-07-09 17:38:41 +00:00
Pak Lui
d47a6ddd02 Correct some of the info about the options for the man page.
This commit was SVN r18847.
2008-07-09 14:21:08 +00:00
Brad Benton
9f0280bd55 arghh...I inadvertantly checked this in to the 1.3 branch rather than
first to the trunk.  So, here is the trunk checkin:

The call to orte_show_help() to notify truncation of the max_inline value
was missing the want_error_header boolean, which eventually results in
a SEGV.  This change corrects the call with the bool set to true.

This commit was SVN r18839.
2008-07-08 15:28:53 +00:00
Pavel Shamis
452141bfb8 Bugfix for #1375.
- Adding configure options that allow to disable IB/RDMA-CM support.
- Code cleanup in openib section of configure

This commit was SVN r18830.
2008-07-08 06:32:54 +00:00
Ralph Castain
613d0f8017 Missed file...just some comment changes, but important ones
This commit was SVN r18828.
2008-07-08 04:02:31 +00:00
Ralph Castain
cf353a1412 Complete the revisions per Brian's email to devel list, plus lengthy discussions between Brian, Jeff, and myself.
These are mostly long additions to comments to document what is going on and why, and how/where it may be revised in the future. Just a couple of small, but important, changes to the code itself.

This commit was SVN r18827.
2008-07-08 03:56:51 +00:00
Jeff Squyres
83987fea75 Next step: Back out r17543 (ficxing a bunch of ROMIO warnings). Let's
see how the next gen panasas stuff does in terms of warnings; we can
always re-merge this later if we want to.  It's just easier if we have
as little OMPI-specific code as possible (particularly when we know
that the panasas code has some big changes coming).

This commit was SVN r18823.

The following SVN revision numbers were found above:
  r17543 --> open-mpi/ompi@b4ec81a9fd
2008-07-07 23:22:26 +00:00
Jeff Squyres
09ff80ff06 Back out r16691 and r16693 because the meat of them are upstream
already, and we're just about to do a ROMIO version refresh -- so the
less OMPI-specific code we have (e.g., indenting and whatnot), the
better. 

Refs trac:1370.

This commit was SVN r18821.

The following SVN revision numbers were found above:
  r16691 --> open-mpi/ompi@8dca19cb3b
  r16693 --> open-mpi/ompi@037a533752

The following Trac tickets were found above:
  Ticket 1370 --> https://svn.open-mpi.org/trac/ompi/ticket/1370
2008-07-07 22:33:49 +00:00
Jeff Squyres
a6cfe0c574 Remove LANL-specific Panasas patches. This is step 1 in upgrading the
ROMIO in Open MPI (the new version of ROMIO will make this patch
defunct, and David Daniel has confirmed that no one at LANL is using
this functionality, anyway).

Refs trac:1370.

This commit was SVN r18819.

The following Trac tickets were found above:
  Ticket 1370 --> https://svn.open-mpi.org/trac/ompi/ticket/1370
2008-07-07 22:08:26 +00:00
Edgar Gabriel
798f47b430 Fixes ticket #1334
hierarch disables itself now if the pml module used is *not* ob1. The reason
is, that the multi-level hierarchy detection algorithm checks the names of the
btl modules used. In case there are no btl's, we would segfault.

Furthermore, three minor changes:
 - the 2-level hierarchy detection is now the default (sm vs. everything else
 in the world).
 - add udapl to the list of protocols checked for by the multi-level hierarch detection
 - some of the verbose statements of hierarch were inaccurate. Fixed those comments/messages.

This commit was SVN r18817.
2008-07-07 18:44:48 +00:00
Ralph Castain
2a1e0a2e64 Fix ticket #1267
With help from Brian, modify the ompi/proc/proc.c code to be more thread-safe. Remove the list operations from the ompi_proc_t constructor and destructor. Insert list appends to ompi_proc_init and ompi_proc_find_and_add as required, and protect those with thread locks. Let only the ompi_proc_finalize function actually remove objects from the ompi_proc_list.

Cleanup a few places where functions might return without unlocking a thread. Ensure the ompi_proc_world also does an OBJ_RETAIN so that the reference count on any subsequently released object is correct.

This commit was SVN r18816.
2008-07-07 17:39:49 +00:00
Jeff Squyres
74aa9689e4 From an initial patch from George, update all the set/get errhandler
functions to use atomics in order to be thread safe.

This commit was SVN r18807.
2008-07-03 19:28:02 +00:00
Ralph Castain
ba5498cdc6 Repair the MPI-2 dynamic operations. This includes:
1. repair of the linear and direct routed modules

2. repair of the ompi/pubsub/orte module to correctly init routes to the ompi-server, and correctly handle failure to correctly parse the provided ompi-server URI

3. modification of orterun to accept both "file" and "FILE" for designating where the ompi-server URI is to be found - purely a convenience feature

4. resolution of a message ordering problem during the connect/accept handshake that allowed the "send-first" proc to attempt to send to the "recv-first" proc before the HNP had actually updated its routes.

Let this be a further reminder to all - message ordering is NOT guaranteed in the OOB

5. Repair the ompi/dpm/orte module to correctly init routes during connect/accept.

Reminder to all: messages sent to procs in another job family (i.e., started by a different mpirun) are ALWAYS routed through the respective HNPs. As per the comments in orte/routed, this is REQUIRED to maintain connect/accept (where only the root proc on each side is capable of init'ing the routes), allow communication between mpirun's using different routing modules, and to minimize connections on tools such as ompi-server. It is all taken care of "under the covers" by the OOB to ensure that a route back to the sender is maintained, even when the different mpirun's are using different routed modules.

6. corrections in the orte/odls to ensure proper identification of daemons participating in a dynamic launch

7. corrections in build/nidmap to support update of an existing nidmap during dynamic launch

8. corrected implementation of the update_arch function in the ESS, along with consolidation of a number of ESS operations into base functions for easier maintenance. The ability to support info from multiple jobs was added, although we don't currently do so - this will come later to support further fault recovery strategies

9. minor updates to several functions to remove unnecessary and/or no longer used variables and envar's, add some debugging output, etc.

10. addition of a new macro ORTE_PROC_IS_DAEMON that resolves to true if the provided proc is a daemon

There is still more cleanup to be done for efficiency, but this at least works.

Tested on single-node Mac, multi-node SLURM via odin. Tests included connect/accept, publish/lookup/unpublish, comm_spawn, comm_spawn_multiple, and singleton comm_spawn.

Fixes ticket #1256

This commit was SVN r18804.
2008-07-03 17:53:37 +00:00
Lenny Verkhovsky
ba1fa73881 Selectign Maffinity only if Paffinity selected fix
This commit was SVN r18797.
2008-07-03 13:39:34 +00:00
Jeff Squyres
7897db314e Add in the use of MPIR_being_debugged for DDT.
This commit was SVN r18796.
2008-07-03 12:27:35 +00:00
Jeff Squyres
160ba5fe11 Set max_inline_data for the iWARP adapters to be 64
This commit was SVN r18782.
2008-06-30 14:25:32 +00:00
Matthias Jurenz
c0ea3635b6 Improved passing of OMPI configure arguments to VT's configure (Ticket #1353)
This commit was SVN r18779.
2008-06-30 13:32:04 +00:00
Pavel Shamis
eaa7676c57 Changing default maximum inline data size
from Maximum_Supported_By_Device to 128 (our original value).

This commit was SVN r18774.
2008-06-30 07:47:09 +00:00
Jeff Squyres
59b7665a3a Clarify and correct comments
This commit was SVN r18768.
2008-06-28 14:15:04 +00:00
Jeff Squyres
f42f55f84b Several improvements and fixes to the openib IBCM CPC:
* Move the passive side QP move to RTS to before we send the reply
   (vs. sending it after we get the RTU).  A lengthy comment explains
   the need for this.
 * Add some timers to the code for analyzing where time is spent.
 * Clarify a few error messages.
 * Currently have a loop around ib_cm_listen() because sometimes it
   fails for seeminly no reason.  Have pending e-mails in to Sean
   Hefty to see if we can figure out why this is happening.  Note that
   the more MPI processes you add, the more likely this error is to
   occur (e.g., ran 720 processes and it happens at least 50% of the
   time).  This makes IBCM somewhat unattractive for general use;
   hopefully we can get a fix...

This commit was SVN r18766.
2008-06-28 10:53:58 +00:00
Jeff Squyres
67933e743c Move some of the things I did in r18762 to the openib BTL proper (in
endpoint.c) because it's almost identical in all the CPC's.  OOB,
XOOB, and IBCM now all invoke the btl error handler properly if
there's an error during wireup.  RDMACM still needs to be done.

This commit was SVN r18764.

The following SVN revision numbers were found above:
  r18762 --> open-mpi/ompi@3eda04578f
2008-06-27 22:48:45 +00:00
Jeff Squyres
3eda04578f Clean up a lot of error handling in IBCM CPC; properly pass error to
upper level btl (and therefore the PML) when something goes wrong
during wireup.

Refs trac:1283.

This commit was SVN r18762.

The following Trac tickets were found above:
  Ticket 1283 --> https://svn.open-mpi.org/trac/ompi/ticket/1283
2008-06-27 21:02:05 +00:00
Shiqing Fan
d129578694 Small fix for including unistd.h header file.
This commit was SVN r18758.
2008-06-27 16:25:31 +00:00
Jeff Squyres
ad16d8f335 Fix some issues in the ibcm CPC:
* Properly handle non-symmetric subnet ID's
 * Be a bit more stringent when checking for the GID
 * Add lots of BTL_VERBOSE's for diagnostics

This commit was SVN r18754.
2008-06-26 20:23:56 +00:00
Ralph Castain
f4621af954 Restore the route initialization to the global server, if one is specified. This enables us to publish/lookup to a global server when using the direct routed module.
This commit was SVN r18753.
2008-06-26 18:02:45 +00:00
Jeff Squyres
90576a435b Fixes trac:1345
The issue is that the field mca_topo_base_comm_t->mtc_periods_or_edges
has a different length, depending on whether the communicator is a
graph or a cart.  One of the comm dup functions always assumed that it
was the length required by graph comms, which could lead to badness in
some cases.  This commit makes the legnth of that field on a comm dup
be the proper length and copies the data over appropriately.

I also changed the syntax of the ompi_comm_copy_topo() function to use
shorter pointer notation; it made the code much easier to read and
fix. 

This commit was SVN r18752.

The following Trac tickets were found above:
  Ticket 1345 --> https://svn.open-mpi.org/trac/ompi/ticket/1345
2008-06-26 16:59:31 +00:00
Ralph Castain
6af8a73dc0 Modify the checking logic to look for NULL return
This commit was SVN r18749.
2008-06-26 14:08:36 +00:00
Ralph Castain
af8c167861 May be picky, but cleanup before returning in error conditions
This commit was SVN r18748.
2008-06-26 13:31:36 +00:00
Ralph Castain
3631a60181 Update the PML selection logic to detect when a modex is required, and in those cases to only have rank=0 report its selected module. This is per the email thread on the devel list:
http://www.open-mpi.org/community/lists/devel/2008/06/4223.php

This commit was SVN r18747.
2008-06-26 13:22:48 +00:00
Jeff Squyres
dd563f9297 Add more output to btl_base_verbose
This commit was SVN r18743.
2008-06-25 20:16:34 +00:00
Jeff Squyres
b8b1aded4e Hook all the ibcm diagnostics up to "--mca btl_base_verbose 1":
* Use BTL_VERBOSE
 * Remove tailing \n's
 * Remove redundant prefixes (BTL_VERBOSE outputs __FILE__, function,
   and __LINE__)

This commit was SVN r18742.
2008-06-25 19:19:03 +00:00
Jeff Squyres
26f6d9cf0c Properly protect the use of dev->transport_type and
IBV_TRANSPORT_IWARP.

This commit was SVN r18738.
2008-06-25 14:50:59 +00:00
Jeff Squyres
64034b65d0 Remove debugging message kruft
This commit was SVN r18737.
2008-06-25 11:32:33 +00:00
George Bosilca
df9e54194b Fix a deadlock in the put protocol for MX. The callback was never triggered,
as the send and the put share the same execution path and in the case of the
put the MCA_BTL_DES_SEND_ALWAYS_CALLBACK was not set.

This commit was SVN r18735.
2008-06-25 03:45:30 +00:00
George Bosilca
24316b77cf This commit is related to ticket #1362. The following devices respect the
behavior described on the ticket: elan, gm, mx, self, sm, tcp.

This commit was SVN r18734.
2008-06-25 03:35:07 +00:00
Jeff Squyres
edad3b66a2 Add warning message if you request one max_inline_data value and the
device reduces it.

This commit was SVN r18728.
2008-06-24 20:40:03 +00:00
Jon Mason
eefc8956b6 Fix a bug introduced in r18725
ompi_info will crash due to a reference to a unalloced struct.  Add a check for
this struct to prevent this crash.

This commit was SVN r18727.

The following SVN revision numbers were found above:
  r18725 --> open-mpi/ompi@c2351d39f1
2008-06-24 20:38:47 +00:00
Brian Barrett
ac337114e6 Use ' ' instead of / / for the regex in split() when preparing the different
lists of flags for exec.  ' ' is a magic value that means match all
white space, and trim leading / trailing whitespace.  Will prevent
many spurious arguments to underlying compiler.

This commit was SVN r18726.
2008-06-24 19:37:05 +00:00
Jon Mason
c2351d39f1 Fix Ticket #1359
If rdmacm cpc is disabled and an iwarp device is forced to be used, then
mca_btl_openib_get_iwarp_subnet_id will attempt to reference an uninitalized
list.  Modify the code to check for the uninitalized list and return zero.

Also, fix a memory leak caused by not freeing alloced addresses structs during
shutdown. 

This commit was SVN r18725.
2008-06-24 19:16:06 +00:00
George Bosilca
fbf341068a Remove the pending queue from the shared memory BTL. The PML is in charge of managing
the fragments that failed to be send, there is no need to replicate the same
mechanism in the BTL.
Force the SM BTL to empty all ack fragments in the component progress function.

This commit was SVN r18724.
2008-06-24 19:01:26 +00:00
Jeff Squyres
ea21c31f44 * MCA params btl_openib_use_eager_rdma can now override the
INI file use_eager_rdma value (fixes trac:1169)
 * fixed a typo in a MCA param help message
 * made the check for enabling short/eager RDMA more robust in the
   presence of progress threads; it now emits a show_help warning

This commit was SVN r18723.

The following Trac tickets were found above:
  Ticket 1169 --> https://svn.open-mpi.org/trac/ompi/ticket/1169
2008-06-24 18:31:46 +00:00
Jeff Squyres
e0545460ff Fixes trac:1355: allow INI file to set max_inline_data vale, and if not
specified, probe for max value supported by device.

This commit was SVN r18720.

The following Trac tickets were found above:
  Ticket 1355 --> https://svn.open-mpi.org/trac/ompi/ticket/1355
2008-06-24 17:18:07 +00:00
Terry Dontje
12baa72580 This commit fixes trac:1306
This commit was SVN r18718.

The following Trac tickets were found above:
  Ticket 1306 --> https://svn.open-mpi.org/trac/ompi/ticket/1306
2008-06-24 14:38:11 +00:00
George Bosilca
1eb62b6c48 Remove a warning. Close ticket #1357.
This commit was SVN r18717.
2008-06-24 14:23:02 +00:00
Jeff Squyres
3f95b906c5 Fixes trac:1085: Improves SLURM configure logic to also allow OS X or any platform where srun is found in the PATH.
This commit was SVN r18714.

The following Trac tickets were found above:
  Ticket 1085 --> https://svn.open-mpi.org/trac/ompi/ticket/1085
2008-06-23 23:12:55 +00:00
Lenny Verkhovsky
937380df2f Memory check after allocation in SM fixed
This commit was SVN r18706.
2008-06-22 14:52:44 +00:00
Jeff Squyres
51d833e8d1 Minor fixes and comment clarifications for MPI-2.1-mandated handling
of strings.  We mostly did the Right Things already; I simplified the
code a bit and also had us not write to more characters in the C
bindings than we're supposed to (per language in the MPI-2.1 spec).

Fixes trac:1238.

This commit was SVN r18705.

The following Trac tickets were found above:
  Ticket 1238 --> https://svn.open-mpi.org/trac/ompi/ticket/1238
2008-06-21 19:33:47 +00:00
Jeff Squyres
281c37afcc Ensure to ignoe the "empty" CPC components.
This commit was SVN r18702.
2008-06-21 11:39:53 +00:00
Jeff Squyres
807e2cc742 Mark a notable place where we need to return an error up to the BTL or
PML.

This commit was SVN r18701.
2008-06-20 22:11:49 +00:00
Jeff Squyres
5ded50df0e * Fix a > that should be ==
* Ensure to destroy the correct QP (local->id[num]->qp will always
   have a valid pointer in it, even if we setup a dummy qp)
 * Note two notable places where we need to figure out how to
   propagate errors up from the CPC to the main BTL / PML when errors
   occur.  Probably have the same issue in IBCM, too.

This commit was SVN r18700.
2008-06-20 22:09:30 +00:00
Jeff Squyres
0074126886 Per #1352, most iWARP adapters today cannot handle connections between
two processes on the same server (!).  So for today, we'll simply mark
all local processes that use iWARP adapters as "unreachable".

More details in #1352.

This commit was SVN r18699.
2008-06-20 22:08:00 +00:00
Jeff Squyres
f4145fce7a Ensure that we don't try to shut down a thread that is not [yet] there
(e.g., if you're excluding some devices, their destructors will be
invoked before the async event thread was setup for them).

This commit was SVN r18698.
2008-06-20 19:30:51 +00:00
Jeff Squyres
ed17b51204 Adjust the max_inline default size down so that it can be accepted on
multiple adapters (eg., Chelsio T3).

But we need to figure out how to determine a good value for the
resident adapter(s) at runtime.  It's problematic because, for
example, Mellanox ConnectX and Chelsio T3 report max_inline values
differently at run-time.  If you ibv_create_qp with a max_inline value
of 0, ConnectX reports back a value that is a formular based on a few
other values (e.g., max_send_sge and max_recv_sge).  But T3 always
reports back "64".

We're looking into this to figure out the best way -- reducing the
default right now should allow other adapters to run while we figure
it out.

This commit was SVN r18697.
2008-06-20 18:24:04 +00:00
George Bosilca
54e7e03695 One less warning.
This commit was SVN r18695.
2008-06-20 17:50:19 +00:00
Jeff Squyres
f366f49179 Remove some leftover kruft; the STL is no longer used, so this lock is
no longer necessary.

This commit was SVN r18694.
2008-06-20 13:39:26 +00:00
Jeff Squyres
7a1206d912 Two more minor changes:
* Put the variable in the MPI namespace; keeps it safely segregated
   from user apps
 * Need to actually "extern" the variable to make the compiler not
   complain that the variable is never referenced

This commit was SVN r18693.
2008-06-20 13:36:02 +00:00
Jeff Squyres
7905db57bd Slightly decrease the number of buffers for the NetXen adapter
This commit was SVN r18691.
2008-06-20 01:00:22 +00:00
Pak Lui
119df10349 Fix the debugging messages
This commit was SVN r18686.
2008-06-19 18:54:20 +00:00
Pak Lui
a924b4a7f4 Define the symbols to allow parallel debuggers to dlopen
the shared object when it is compiled with the Sun Studio C compiler.
Depends on where the extern variables that are included in the headers were 
initialized, there can be instances wheter there is no storage allocated 
for the variables and therefore the symbols may or may not be defined 
when the debugger tries to dlopen this message queue dll.

This commit was SVN r18685.
2008-06-19 18:49:25 +00:00
George Bosilca
bc9b950162 Honor ^ for the PML selection.
This commit was SVN r18683.
2008-06-19 16:50:46 +00:00
Ralph Castain
265b4de5de Ensure that the call to orte_routed is properly protected at compile time when RTE support is disabled
This commit was SVN r18681.
2008-06-19 15:20:06 +00:00
Jeff Squyres
a884eebdf1 This warning has been bugging me in MTT nightly runs for forever: make
the char string ident in the C++ library be non-static so that other
places can see it.  This makes the C++ library version string
analogous to all the other version strings.

This commit was SVN r18679.
2008-06-19 14:40:37 +00:00
Ralph Castain
3b5e80fa61 Shift responsibility for preconnecting the oob to the orte routed framework, which is the only place that knows what needs to be done. Only the direct module will actually do anything - it uses the same algo as the original preconnect function.
This commit was SVN r18677.
2008-06-19 13:48:26 +00:00
Pavel Shamis
4537827973 Making the qp allocation more optimized.
- sq parameter was replaced with max_inline parameter
- inline is allocated only for relevant QPs

This commit was SVN r18675.
2008-06-19 08:40:39 +00:00
Ralph Castain
955d117f5e Add a new grpcomm module that mimics the old 1.2 behavior - it -always- does a modex because it always includes the architecture. Hence, we called it "blind-and-dumb" since it doesn't look to see if this is required - moniker of "bad". :-)
Update the ESS API so we can update the stored arch's should the modex include that info. Update ompi/proc to check/set the arch for remote procs, and add that function call to mpi_init right after the modex is done.

Setup to allow other grpcomm modules to decide whether or not to add the arch to the modex, and to detect if other entries have been made. If not, then the modex can just fall through. Begin setting up some logic in the "basic" module to handle different arch situations.

For now, default to the "bad" module so we will work in all situations, even though we may be sending around more info than we really require.

This fixes ticket #1340

This commit was SVN r18673.
2008-06-18 22:17:53 +00:00
Jeff Squyres
da6aa57efb Change up the logic a bit to handle the Red Storm case a bit better
(still need the real yod environment variable name), and add in some
lengthy comments explaining the two different methods of debugger
attach that we're using.

MPI handle debugging doesn't seem to be working at the moment; still
checking into it...

This commit was SVN r18672.
2008-06-18 21:33:08 +00:00
Ralph Castain
282a220e7e Update the debugger interface per email thread with Jeff and Brian. Handoff to them for final test and validation
This commit was SVN r18670.
2008-06-18 15:28:46 +00:00
George Bosilca
1dba362a01 Make the ompi_ddt_dump function externally visible.
This commit was SVN r18668.
2008-06-18 08:37:43 +00:00
George Bosilca
8e7c35e76c These symbols are only available via the module/component structure, so they
don't have to be globally visible.

This commit was SVN r18666.
2008-06-18 08:20:02 +00:00
Ralph Castain
0532d799d6 Complete implementation of the --without-rte-support configure option. Working with Brian, this has been tested on RedStorm.
Some minor changes to help facilitate debugger support so that both mpirun and yod can operate with it. Still to be completed.

This commit was SVN r18664.
2008-06-18 03:15:56 +00:00
Jeff Squyres
d0cfca5990 Documentation describing how TV (and others like it) attach to MPI
processes.  Originally downloaded from
http://www-unix.mcs.anl.gov/mpi/mpi-debug/mpich-attach.txt -- cached
here in case that file ever disappears someday.

This commit was SVN r18663.
2008-06-17 21:19:34 +00:00
Jeff Squyres
1e00d26c61 Minor word smything
This commit was SVN r18660.
2008-06-16 19:56:13 +00:00
Lenny Verkhovsky
f4811d6c4d NUMA Awareness support. Gleb's patch
This commit was SVN r18658.
2008-06-15 13:43:28 +00:00
Brian Barrett
7712b07ac4 Add perl based wrapper compilers for cross-compile environments. The default
is still to use the C based wrapper compilers (which have many more features
and are more well tested).  The Perl compilers are enabled with the option
--enable-script-wrapper-compilers, which also ignores the option
--disable-binaries (ie --enable-script-wrapper-compilers --disable-binaries
will result in perl-based wrapper compilers being installed, but no other
binaries being installed).

This commit was SVN r18655.
2008-06-13 22:52:25 +00:00
Brian Barrett
79ad6d983e - The ptmalloc2 memory manager component is now by default built as
a standalone library named libopenmpi-malloc.  Users wanting to
  use leave_pinned with ptmalloc2 will now need to link the library
  into their application explicitly.  All other users will use the
  libc-provided allocator instead of Open MPI's ptmalloc2.  This change
  may be overriden with the configure option enable-ptmalloc2-internal
- The leave_pinned options will now default to using mallopt on
  Linux in the cases where ptmalloc2 was not linked in.  mallopt
  will also only be available if munmap can be intercepted (the
  default whenever Open MPI is not compiled with --without-memory-
  manager.
- Open MPI will now complain and refuse to use leave_pinned if
  no memory intercept / mallopt option is available.

This commit was SVN r18654.
2008-06-13 22:32:49 +00:00
Galen Shipman
44cd373a87 I also forgot to initialize the convertor max_data, george probably copied
this dumb mistake from me. 

This commit was SVN r18653.
2008-06-13 18:33:43 +00:00
George Bosilca
170b9c344e Mea culpa. I forget to initialize the max_data before the call to the
convertor.

This commit was SVN r18651.
2008-06-12 17:24:39 +00:00
Pavel Shamis
dc3f14736d Fixing QP initialization stuff.
This commit was SVN r18650.
2008-06-11 16:31:39 +00:00
Matthias Jurenz
a9ff2b84f2 Bugfix (Ticket #1318): Implemented copy-contructor of 'FiltHandlerArgument' in 'vt_filthandler.cc' instead of the header file 'vt_filthandler.h'
This commit was SVN r18637.
2008-06-10 09:03:21 +00:00
Galen Shipman
a239877b78 revert my previous boneheadedness
This commit was SVN r18634.
2008-06-10 01:19:04 +00:00
George Bosilca
dc0ab0d0a8 Enable the sendi path.
This commit was SVN r18633.
2008-06-09 23:03:56 +00:00
Galen Shipman
4ef4a9520f remove showhelp..
This commit was SVN r18628.
2008-06-09 20:53:01 +00:00
Aurelien Bouteiller
ebe6df4c06 Moving the pml_v_output global variable inside the pml_v structure. This should avoid one of the missing symbols when visibility is enabled.
This commit was SVN r18627.
2008-06-09 20:38:44 +00:00
Ralph Castain
c13cadc3c7 Refs trac:1255
This commit repairs the debugger initialization procedure. I am not closing the ticket, however, pending Jeff's review of how it interfaces to the ompi_debugger code he implemented. There were duplicate symbols being created in that code, but not used anywhere. I replaced them with the ORTE-created symbols instead. However, since they aren't used anywhere, I have no way of checking to ensure I didn't break something.

So the ticket can be checked by Jeff when he returns from vacation... :-)

This commit was SVN r18625.

The following Trac tickets were found above:
  Ticket 1255 --> https://svn.open-mpi.org/trac/ompi/ticket/1255
2008-06-09 20:34:14 +00:00
Galen Shipman
9efbec0383 fix normal send path
remove unneeded checks

This commit was SVN r18624.
2008-06-09 20:25:27 +00:00
Galen Shipman
dbd282fcad doh.. fix GET protocol..
This commit was SVN r18623.
2008-06-09 19:45:44 +00:00
Ralph Castain
9613b3176c Effectively revert the orte_output system and return to direct use of opal_output at all levels. Retain the orte_show_help subsystem to allow aggregation of show_help messages at the HNP.
After much work by Jeff and myself, and quite a lot of discussion, it has become clear that we simply cannot resolve the infinite loops caused by RML-involved subsystems calling orte_output. The original rationale for the change to orte_output has also been reduced by shifting the output of XML-formatted vs human readable messages to an alternative approach.

I have globally replaced the orte_output/ORTE_OUTPUT calls in the code base, as well as the corresponding .h file name. I have test compiled and run this on the various environments within my reach, so hopefully this will prove minimally disruptive.

This commit was SVN r18619.
2008-06-09 14:53:58 +00:00
Jeff Squyres
c087b4cd4f * Revert r18067
* Add specific comments about why we're not setting MPI_ERROR here

This commit was SVN r18616.

The following SVN revision numbers were found above:
  r18067 --> open-mpi/ompi@58e31d767e
2008-06-07 02:44:10 +00:00
George Bosilca
2aec094d56 The PML V is a component so it should use OMPI_MODULE_DECLSPEC.
This commit was SVN r18610.
2008-06-06 17:43:57 +00:00
George Bosilca
ae7bca2f4a Update the MPI_ERROR field as well.
This commit was SVN r18607.
2008-06-06 15:53:17 +00:00
Josh Hursey
1de50b523c Fix some Coverity 'Event set_but_not_used' highlights.
Thanks to Jeff for bringing them to my attention.

This commit was SVN r18606.
2008-06-06 14:38:41 +00:00
Jeff Squyres
1a748bc7be First cut at the NetEffect NE020 NIC.
This commit was SVN r18599.
2008-06-05 20:24:24 +00:00
Jeff Squyres
9109f7126a Per CID 988, free some memory that would be leaked in an error condition.
This commit was SVN r18597.
2008-06-05 20:04:38 +00:00
Jeff Squyres
f0d465c30a Slightly simplify the code and remove a compiler warning.
This commit was SVN r18596.
2008-06-05 19:08:08 +00:00
Jeff Squyres
b1999bbba3 * Use inclusive NIC/HCA language
* Add a description of receive_queues

This commit was SVN r18595.
2008-06-05 19:07:22 +00:00
Pavel Shamis
7b9024bc05 Updating Mellanox's Copyright in files touched in 2008
This commit was SVN r18592.
2008-06-05 13:40:26 +00:00
Ralph Castain
6ddcce4085 Apply a patch from Edgar to fix the Intercomm MTT tests.
Fixes ticket #1332

This commit was SVN r18591.
2008-06-05 12:53:12 +00:00
Pavel Shamis
379e00050c Fixing openib btl finalize flow. Bug fix for #1286.
This commit was SVN r18590.
2008-06-05 12:20:13 +00:00
Jeff Squyres
91a281080a Fix a compiler warning for a case that would never really happen
anyway.  Rename a variable to be a bit more descriptive.

This commit was SVN r18585.
2008-06-04 19:10:23 +00:00