1
1

164 Коммитов

Автор SHA1 Сообщение Дата
Mike Dubman
9ffeeb69d9 fix help message
This commit was SVN r25364.
2011-10-25 14:02:43 +00:00
Samuel Gutierrez
663f4546f5 fix define typo in psm mtl.
This commit was SVN r25362.
2011-10-24 18:38:12 +00:00
Brian Barrett
d8b5b544ad Update list name to match change in spec
This commit was SVN r25273.
2011-10-12 20:09:39 +00:00
Mike Dubman
7a9ae43276 added support for shared memory transport in mxm
This commit was SVN r25220.
2011-10-03 12:59:55 +00:00
Brian Barrett
fc29ffebdb * remove two aborts that aren't necessary
This commit was SVN r25214.
2011-09-29 22:27:23 +00:00
Brian Barrett
14f32a1a54 * Clean up progress function
* Only print returnable errors when verbose=1.  Still print errors when
  we're going to abort, since those obviously aren't returnable

This commit was SVN r25213.
2011-09-29 22:26:33 +00:00
Brian Barrett
758f8a4d87 * More debugging output
* Make recv short block events use the callback mechanism so that can
  add overflow debugging

This commit was SVN r25212.
2011-09-29 21:59:48 +00:00
Brian Barrett
c08ea5c0f5 Set options correctly for the two pts
This commit was SVN r25211.
2011-09-29 21:56:37 +00:00
Brian Barrett
05f800abae Properly unpack data for long unexpected
This commit was SVN r25210.
2011-09-29 17:25:45 +00:00
Brian Barrett
bb9e73232a * Leverage hdr_data and opcount to improve debugging
* Clean up handling of short synchronous messages

This commit was SVN r25208.
2011-09-28 21:18:47 +00:00
Brian Barrett
71d8300607 * Fix name clash with macros in mtl_portals4.h
* hdr_data now includes opcount and length for all messages, which is the match
  bits for long and rndv messages
* Re-add probe implementation 

This commit was SVN r25207.
2011-09-28 16:53:01 +00:00
Brian Barrett
2fb8045fad clean up printfs
This commit was SVN r25206.
2011-09-28 15:28:46 +00:00
Brian Barrett
26e781f002 * Remove triggered code for now
* Move from per-endpoint send/recv count to just send side op count

This commit was SVN r25205.
2011-09-28 15:25:39 +00:00
Brian Barrett
592c1ab6db * revert probe and size information changes, since it seems to break everything
This commit was SVN r25204.
2011-09-28 14:57:19 +00:00
Brian Barrett
211b5c7824 * Make triggered protocol only work for non-wildcard receives
* Always encode length in header data to make probe work
* General send/receive cleanups
* Implement iprobe

This commit was SVN r25197.
2011-09-27 22:45:00 +00:00
Brian Barrett
77c560be42 updates to match new api changes
This commit was SVN r25196.
2011-09-27 20:38:22 +00:00
Mike Dubman
98f382ba0e fixes in mxm mtl
This commit was SVN r25066.
2011-08-19 22:18:17 +00:00
Mike Dubman
e3c869d83b fix double free
This commit was SVN r25041.
2011-08-10 05:47:55 +00:00
Mike Dubman
a751cd93d3 improve debug macro availability
This commit was SVN r25022.
2011-08-09 10:54:08 +00:00
Mike Dubman
bfd75de6f9 fix selection logic: if no suitable device found - disqulaify mxm w/o complains.
This commit was SVN r25021.
2011-08-09 07:09:37 +00:00
Mike Dubman
1d3f5e1314 better mxm selection mechanism, some refactoring
This commit was SVN r25005.
2011-08-07 12:06:49 +00:00
Mike Dubman
7b18ab2fa9 remove unused includes
This commit was SVN r24980.
2011-08-03 07:07:29 +00:00
Mike Dubman
45ea375531 code and readme updates, some refactoring
This commit was SVN r24977.
2011-08-02 14:30:11 +00:00
Mike Dubman
aefffa073d initial implementation of MXM MTL layer
This commit was SVN r24946.
2011-07-26 04:36:21 +00:00
Mike Dubman
fd17f20ed5 Currently MTLs do no handle communicator contexts in any special way,
they only add the context id to the tag selection of the underlying 
messaging meachinsm. 
 
 We would like to enable an MTL to maintain its own context data
per-communicator. This way an MTL will be able to queue incoming eager 
messages and rendezvous requests per-communicator basis.

 The MTL will be allowed to override comm->c_pml_comm member, 
since it's unused in pml_cm anyway. 

This commit was SVN r24858.
2011-07-06 18:25:49 +00:00
Brian Barrett
e8817f3f63 * Don't send acks for expected triggered messages; still need to get the rest of the data
* Don't ask for UNLINK events for persistent long unexpected ME or the get MEs.

This commit was SVN r24814.
2011-06-23 16:21:10 +00:00
Brian Barrett
09d89242d6 Crank up the number of short receive blocks so that we're unlikely to hit the flow
control case.  Uses about same amount of memory as the Portals 3.3 implementations

This commit was SVN r24782.
2011-06-16 21:58:53 +00:00
Brian Barrett
4fec0c198d updtae short recv blocks to properly setup for triggered operations (where
they also store the triggered start message)

This commit was SVN r24777.
2011-06-16 16:51:59 +00:00
Brian Barrett
83154af74d Check return codes a bit more closely
Fix broken debug output in any_source recv case
Other minor code cleanups

This commit was SVN r24774.
2011-06-13 15:18:55 +00:00
Brian Barrett
a7c682cdb0 Fix starting buffer point for triggered get. Should be after the eager part of the
message

This commit was SVN r24752.
2011-06-06 17:08:13 +00:00
Brian Barrett
b778d785fb Add some debugging output and fix some places where the output id and
verbosity level were swapped

This commit was SVN r24740.
2011-06-01 17:20:18 +00:00
Brian Barrett
37d5c7e2ca * Add ability to set long protocol with MCA parameter
* Instead of static arrays of send/recv counts, put them in the endpoint

This commit was SVN r24735.
2011-05-26 21:53:39 +00:00
Brian Barrett
beb1bc70b2 * Add support for using modex to exchange NID/PID pairs when using Portals4.
Rather than try to support a bunch of lightweight environments like I did
  with the Portals3 code, always use the "modex" and hack the grpcomm for
  the SHMEM implementation to return the right nid/pid for a remote
  process by "magic".

This commit was SVN r24733.
2011-05-25 22:10:27 +00:00
Brian Barrett
d8b7ea315e First take at implementing rndv and triggered protocols
This commit was SVN r24699.
2011-05-13 05:57:16 +00:00
Brian Barrett
43902221cc * Fix bad argument to PtlGet in long receive
* Fix bad params when configuring ME for long unexpected

This commit was SVN r24698.
2011-05-13 03:56:03 +00:00
Brian Barrett
3d4b7ecbaf updates for API changes
This commit was SVN r24628.
2011-04-20 16:48:27 +00:00
George Bosilca
6fc4c22037 Pedantic.
This commit was SVN r24460.
2011-02-25 00:29:48 +00:00
Brian Barrett
4859bb82e2 * Update to support direct call
* Add missing cancel (not that it does anything useful)
* Fix bug in opal_output call

This commit was SVN r24269.
2011-01-19 20:49:28 +00:00
Brian Barrett
6cf74eeb03 Fix bug in looking at convertor_unpack return code. Always print debug
on error message for now.

This commit was SVN r24163.
2010-12-10 22:36:47 +00:00
Brian Barrett
a26fadb26e Bring Portals4 updates back into the trunk
This commit was SVN r24154.
2010-12-07 20:11:25 +00:00
Nathan Hjelm
94d4aa7253 fixed wrong include
This commit was SVN r24133.
2010-12-01 23:10:12 +00:00
Greg Koenig
0694a3203b This was a small mistake introduced in r23925 in the changes to libevent.
This commit was SVN r24043.

The following SVN revision numbers were found above:
  r23925 --> open-mpi/ompi@fceabb2498
2010-11-11 21:54:28 +00:00
Jeff Squyres
64863d086c Add 2 new MCA params:
* mtl_mx_board: allow selection of specific MX NIC/board to use.  <0
   means "use any board".
 * mtl_mx_endpoint: allow selection of specific MX endpoint to use.
   <0 means "use any endpoint".

This commit was SVN r23996.
2010-11-05 17:17:20 +00:00
Ralph Castain
fceabb2498 Update libevent to the 2.0 series, currently at 2.0.7rc. We will update to their final release when it becomes available. Currently known errors exist in unused portions of the libevent code. This revision passes the IBM test suite on a Linux machine and on a standalone Mac.
This is a fairly intrusive change, but outside of the moving of opal/event to opal/mca/event, the only changes involved (a) changing all calls to opal_event functions to reflect the new framework instead, and (b) ensuring that all opal_event_t objects are properly constructed since they are now true opal_objects.

Note: Shiqing has just returned from vacation and has not yet had a chance to complete the Windows integration. Thus, this commit almost certainly breaks Windows support on the trunk. However, I want this to have a chance to soak for as long as possible before I become less available a week from today (going to be at a class for 5 days, and thus will only be sparingly available) so we can find and fix any problems.

Biggest change is moving the libevent code from opal/event to a new opal/mca/event framework. This was done to make it much easier to update libevent in the future. New versions can be inserted as a new component and tested in parallel with the current version until validated, then we can remove the earlier version if we so choose. This is a statically built framework ala installdirs, so only one component will build at a time. There is no selection logic - the sole compiled component simply loads its function pointers into the opal_event struct.

I have gone thru the code base and converted all the libevent calls I could find. However, I cannot compile nor test every environment. It is therefore quite likely that errors remain in the system. Please keep an eye open for two things:

1. compile-time errors: these will be obvious as calls to the old functions (e.g., opal_evtimer_new) must be replaced by the new framework APIs (e.g., opal_event.evtimer_new)

2. run-time errors: these will likely show up as segfaults due to missing constructors on opal_event_t objects. It appears that it became a typical practice for people to "init" an opal_event_t by simply using memset to zero it out. This will no longer work - you must either OBJ_NEW or OBJ_CONSTRUCT an opal_event_t. I tried to catch these cases, but may have missed some. Believe me, you'll know when you hit it.

There is also the issue of the new libevent "no recursion" behavior. As I described on a recent email, we will have to discuss this and figure out what, if anything, we need to do.

This commit was SVN r23925.
2010-10-24 18:35:54 +00:00
Brian Barrett
9febaa475e * Add shell of functionality required for supporting Portals4
* Update places where orte-free builds have failed

This commit was SVN r23891.
2010-10-14 22:49:09 +00:00
Jeff Squyres
73bcc4a36b Fix mistake that came in via the ompi-agen tree in r23764. The mistake wasn't part of the core autogen upgrade; it was an additional 'bonus' cleanup. Oops. The mistake will always create a set of directories under installdir, even if you do not --with-devel-headers. The set of directories will be empty, but still -- they should not be there at all. This commit fixes that -- the directories are not created at all if you do not --with-devel-headers
This commit was SVN r23801.

The following SVN revision numbers were found above:
  r23764 --> open-mpi/ompi@40a2bfa238
2010-09-24 22:53:28 +00:00
Ralph Castain
40a2bfa238 WARNING: Work on the temp branch being merged here encountered problems with bugs in subversion. Considerable effort has gone into validating the branch. However, not all conditions can be checked, so users are cautioned that it may be advisable to not update from the trunk for a few days to allow MTT to identify platform-specific issues.
This merges the branch containing the revamped build system based around converting autogen from a bash script to a Perl program. Jeff has provided emails explaining the features contained in the change.

Please note that configure requirements on components HAVE CHANGED. For example. a configure.params file is no longer required in each component directory. See Jeff's emails for an explanation.

This commit was SVN r23764.
2010-09-17 23:04:06 +00:00
Rolf vandeVaart
0324fdb407 Created two new macros that are used when filling in either the
status structure or the _ucount field in the status structure.
On 64-bit sparc, the macros resolve into integer array assignments.
For all others, they are just simple assignments.  This fixes 
possible BUS errors seen when running on the SPARC processor.
This bug was introduced when the _count field changed from an int
into a size_t.  See the changes to request.h for additional details.

This commit fixes trac:2514.

This commit was SVN r23554.

The following Trac tickets were found above:
  Ticket 2514 --> https://svn.open-mpi.org/trac/ompi/ticket/2514
2010-08-04 19:36:40 +00:00
George Bosilca
733d25a8a3 First step toward fixing the MPI_Get_count issues from the ticket #2241. Next
step is the configure and Fortran mojo that Jeff will put in. Until then I
guess the Fortran interface is broken (at least all functions using the hidden
count firld in the MPI_Status).

This commit was SVN r23467.
2010-07-21 20:07:00 +00:00
Jeff Squyres
c8bb7537e7 Remove include/opal/sys/cache.h -- its only purpose in life was to
#define CACHE_LINE_SIZE to 128.  This name has a conflict on NetBSD,
and it seems kinda odd to have a header file that ''only'' defines a
single value.  Also, we'll soon be raising hwloc to be a first-class
item, so having this file around seemed kinda weird.

Therefore, I replaced CACHE_LINE_SIZE with opal_cache_line_size, an
int (in opal/runtime/opal_init.c and opal/runtime/opal.h) on the
rationale that we can fill this in at runtime with hwloc info (trunk
and v1.5/beyond, only).  The only place we ''needed'' a compile-time
CACHE_LINE_SIZE was in the BTL SM (for struct padding), so I made a
new BTL_SM_ preprocessor macro with the old CACHE_LINE_SIZE value
(128).  That use isn't suitable for run-time hwloc information,
anyway.

This commit was SVN r23349.
2010-07-06 14:33:36 +00:00