1
1
Граф коммитов

2705 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
60f39a30f6 Revert r18409; that commit broke the build because it forgot to add
the btl_openib_iwarp.c and btl_openib_iwarp.h files.

This commit was SVN r18410.

The following SVN revision numbers were found above:
  r18409 --> open-mpi/ompi@056bbb68c8
2008-05-08 00:22:21 +00:00
Jon Mason
056bbb68c8 Abstract iWARP subnet ID functions
The iWARP subnet ID determination should not be in the RDMACM cpc, as
it was in the preversion, as this violates the cpc abstract that is
present throughout the code.  Also, this patch uses the opal_list_t
data struct instead of using its own linked lists.

This commit was SVN r18409.
2008-05-07 23:59:43 +00:00
Ralph Castain
7c7b9b0486 Do a little cleanup on the opal graph class and opal carto framework to conform to OMPI naming conventions and avoid potential conflict with user applications - no change in functionality, passes carto test program
This commit was SVN r18407.
2008-05-07 19:33:49 +00:00
Jeff Squyres
157cea378f * A few fixes to make IP address and port number comparisons properly
* A few indenting and style fixes

This commit was SVN r18405.
2008-05-07 16:56:07 +00:00
Jeff Squyres
bfae8ea828 The comment wasn't long enough; I felt the need to make it longer (and
explain a little more ;-) ).

This commit was SVN r18404.
2008-05-07 16:53:05 +00:00
Jeff Squyres
63abb3eb9b Clarify a comment / fix typos.
This commit was SVN r18402.
2008-05-07 14:51:36 +00:00
Shiqing Fan
8393fb5d47 Use the new memchecker_call function for memory checking of non-blocking communication.
This commit was SVN r18399.
2008-05-07 12:28:51 +00:00
Ralph Castain
ff70636024 Allgather_list needs its own tag to avoid conflicting with the allgather modex operation.
All spawned procs must decode the port of the spawning process so they can communicate in direct routed mode.

This fixes comm_spawn for all routing modes.

This commit was SVN r18395.
2008-05-07 03:03:56 +00:00
Rolf vandeVaart
0e32dd1022 Add MPI_Alltoallv to tuned collectives and add a pairwise implementation of MPI_Alltoallv. However, do not change the default behavior for now. The only way to use new pairwise implementation is via mca parameters.
This commit was SVN r18394.
2008-05-07 02:31:24 +00:00
Jon Mason
502d164908 Create subnet ID's for iWARP.
This enables subnet differientation for iWARP devices, and rearrange
initilization so that the services are available when they are needed.

This commit was SVN r18393.
2008-05-06 22:43:52 +00:00
Jon Mason
9c724128f8 Handle no IP Address in rdmacm more resiliently
If there is no IP Address, have rdmacm log the correct error and let
another cpc have a go at it.  This is being done by splitting off the
IP address checking logic for the modex message creation, and having
it log the correct error in the error case.

This commit was SVN r18392.
2008-05-06 22:31:29 +00:00
Jon Mason
46bfd42c09 Fix compile warnings in rdmacm
Fix some reported compiler warnings and make the code a little prettier.

This commit was SVN r18391.
2008-05-06 22:19:28 +00:00
Jon Mason
9066168cd1 Prevent iWARP qp flush errors.
For iWARP, the TCP connection is tied to the QP once the QP is in RTS.  
And destroying the QP is thus tied to connection teardown for iWARP.  
This is a key distinction from IB, I think.   Anyway, to destroy the 
connection in iWARP you must move the QP out of RTS, either into CLOSING 
for a nice graceful close, or to ERROR if you want to be rude.  In both 
cases, all pending non-completed SQ and RQ WRs must be flushed.

This patch ignores all flush errors reaped by the cq and removes an
earlier attempt to work around this in the rdmacm cpc.

This commit was SVN r18388.
2008-05-06 21:57:40 +00:00
Josh Hursey
9971bc9d95 Merge in the mca_base_select changes per RFC:
http://www.open-mpi.org/community/lists/devel/2008/04/3779.php

{{{
svn merge -r 18276:18380 https://svn.open-mpi.org/svn/ompi/tmp-public/jjh-mca-play .
}}}

Any components not in the trunk, but in one of the effected frameworks *must* be
updated. Contact the list, look at the RFC, or look at the diff for how to do this.

Sorry for the early commit of this, but I wanted to get it in today (per RFC) and
didn't know if I would have a chance later today.

This commit was SVN r18381.
2008-05-06 18:08:45 +00:00
Jeff Squyres
a06d4023b8 Oops -- missed one sys_errlist -> strerror().
This commit was SVN r18378.
2008-05-06 13:22:36 +00:00
Jeff Squyres
4154e587de strerror() is much better.
This commit was SVN r18376.
2008-05-05 21:06:07 +00:00
Shiqing Fan
f35a06119c Use memchecker_convertor_call function instead the old one. Move the function to the place that we can use convertor.
This commit was SVN r18370.
2008-05-05 13:57:27 +00:00
Jon Mason
a3bf503e01 Remove error on rdma cm
If there are multiple QP's, RDMACM will not send a message if the
qpnum != 0.  In doing so, it will log an error unecessarily.  This
removes that.

This commit was SVN r18363.
2008-05-02 20:12:01 +00:00
Jon Mason
3989981578 Enable support of num_proc > num_nodes
Add the logic to support using port numbers, instead of simply using
the IP address of the sending node to determine which endpoint to
connect.  Since each process calls the cpc query function, it will
generate its own port to listen on thus enablign this to work.

This commit was SVN r18362.
2008-05-02 16:20:28 +00:00
Jeff Squyres
ba5615a18f Merge in /tmp-public/cpc3 branch to trunk. oob/xoob still remains the
default CPC.

This commit was SVN r18356.
2008-05-02 11:52:33 +00:00
Donald Kerr
843a35094f adding local work queue accounting
This commit was SVN r18352.
2008-05-01 21:01:51 +00:00
George Bosilca
a69ac964df Allow any order in the list of Elan vpid.
This commit was SVN r18350.
2008-05-01 20:32:03 +00:00
Josh Hursey
dcd21d7d07 Some checkpoint/restart fixes in response to r18338 (changes in modex).
Things should be working now.

This commit was SVN r18348.

The following SVN revision numbers were found above:
  r18338 --> open-mpi/ompi@3e55fe6f6d
2008-05-01 17:48:13 +00:00
Ralph Castain
3e55fe6f6d Fold in the revised modex scheme. Move the ompi_proc_t modex portions to the RTE level since the daemons already have that info. Provide each process with the equivalent of a "nidmap" - both a map of what nodes are in the job, and a map of which node each process is on. This enables the use of static ports, though that hasn't been turned "on" in this commit.
Update the rsh tree spawn capability so we spawn the next wave of daemons before launching our own local procs.

Add an ability to encode nodenames for large clusters with contiguous node name numbering schemes - this allows communication of all node names in a few bytes instead of tens-of-bytes/node.

This commit was SVN r18338.
2008-04-30 19:49:53 +00:00
Pavel Shamis
61cc8843bf The r17940 broke the XRC code.
The endpoint may be appended to list during XOOB connection bring up.

This commit was SVN r18328.

The following SVN revision numbers were found above:
  r17940 --> open-mpi/ompi@ebfdd133f5
2008-04-29 13:22:40 +00:00
Galen Shipman
ced88a338b include portals modex fun in the distro
This commit was SVN r18325.
2008-04-28 18:51:54 +00:00
Brad Penoff
c699236be2 updating SCTP BTL to configure properly with FreeBSD 7
This commit was SVN r18324.
2008-04-28 04:19:10 +00:00
George Bosilca
6e6c370917 Rollback r18274 as its legal to have a sequence number smaller than the
expected one. It doesn't necessarily means the message is duplicated,
it can simply signify the message is out of sequence and the counter
overflowed.

This commit was SVN r18323.

The following SVN revision numbers were found above:
  r18274 --> open-mpi/ompi@73c9de3af9
2008-04-27 18:35:54 +00:00
Aurelien Bouteiller
611d52fa95 Fix a bug that rpevented to use the same port (as returned by Open_port) for several Comm_accept)
This commit was SVN r18303.
2008-04-25 20:41:44 +00:00
Aurelien Bouteiller
c20b020ea6 Fix ticket #1275. The pml v can now be correctly deactivated on the configure command line. Also fix a dist target under some unusual circumpstances.
This commit was SVN r18291.
2008-04-24 21:42:54 +00:00
Josh Hursey
2c736873bb Fix a checkpoint/restart bug that causes a restarted application to occasionally throw a SIGSEGV or SIGPIPE due to invalid socket descriptors.
The problem was caused by a bad ordering between the restart of the ORTE level tcp connections (in the OOB - out-of-band communication) and the Open MPI level tcp connections (BTLs). Before this commit ORTE would shutdown and restart the OOB completely before the OMPI level restarted its tcp connections. What would happen is that a socket descriptor used by the OMPI level on checkpoint was assigned to the ORTE level on restart. But the OMPI level had no knowledge that the socket descriptor it was previously using has been recycled so it closed it on restart. This caused the ORTE level to break as the newly created socket descriptor was closed without its knowledge.

The fix is to have the OMPI level shutdown tcp connections, allow the ORTE level to restart, and then allow the OMPi level to restart its connections. This seems obvious, and I'm surprised that this bug has not cropped up sooner. I'm confident that this specific problem has been fixed with this commit.

Thanks to Eric Roman and Tamer El Sayed for their help in identifying this problem, and patience while I was fixing it.

 * Add a new state {{{OPAL_CRS_RESTART_PRE}}}. This state identifies when we are on the down slope of the INC (finalize-like) which is useful when you want to close, but not reopen a component set for fear of interfering with a lower level.
 * Use this new state in OMPI level coordination. Here we want to make sure to play well with both the OMPI/BTL/TCP and ORTE/OOB/TCP components.
 * Update ft_event functions in PML and BML to handle the new restart state.
 * Add an additional flag to the error output in OOB/TCP so we can see what the socket descriptor was on failure as this can be helpful in debugging.

This commit was SVN r18276.
2008-04-24 17:54:22 +00:00
George Bosilca
3ccac4f803 Oops ...
This commit was SVN r18275.
2008-04-24 15:54:52 +00:00
George Bosilca
73c9de3af9 Bark if we got a wrong sequence number. Here wrong means that the
seq number if smaller than what we expect.

This commit was SVN r18274.
2008-04-24 15:48:43 +00:00
Rich Graham
4d1ae7b05f accidentally made a change in the wrong place.
This commit was SVN r18262.
2008-04-23 17:32:05 +00:00
Rich Graham
293dd6ad4e add myself to list of people building this module.
This commit was SVN r18261.
2008-04-23 17:25:36 +00:00
Rich Graham
7658cc79e4 Pass in the correct module to the reduction call.
This commit was SVN r18260.
2008-04-23 17:23:30 +00:00
Adrian Knoth
c53d3c3c22 reverted r18169,r18170 due to connection reset by peer on odin/sif
This commit was SVN r18255.

The following SVN revision numbers were found above:
  r18169 --> open-mpi/ompi@20473bfda2
  r18170 --> open-mpi/ompi@d34dfbe12c
2008-04-23 15:26:15 +00:00
Josh Hursey
cc83d41ad9 Merge in tmp/jjh-scratch
{{{
 svn merge -r 18218:18240 https://svn.open-mpi.org/svn/ompi/tmp/jjh-scratch .
}}}

Contains:
 * Primarily a fix for a user reported problem where a cached file descriptor is causing a SIGPIPE on restart.
 * Cleanup some small memory leaks from using mca_base_param_env_var() - Thanks Jeff
 * Cleanup ORTE FT tool compilation in non-FT builds - Thanks Tim P.
 * Cleanup mpi interface with missplaced {{{OPAL_CR_ENTER_LIBRARY}}} - Thanks Terry
 * Some other sundry cleanup items all dealing with C/R functionality in the trunk.

This commit was SVN r18241.
2008-04-23 00:17:12 +00:00
Tim Mattox
0215474cb8 Fix two bugs in coll_sm_module.c from bit-rot:
Fixed a selection bug, and removed a bogus "free(proc)" call
which ultimately caused MPI_Finalize to crash.

This commit was SVN r18235.
2008-04-22 18:41:21 +00:00
Jeff Squyres
c40740947f Fix minor spelling error.
This commit was SVN r18229.
2008-04-22 13:11:50 +00:00
Galen Shipman
27c425b304 make portals level ack's optional (require ACK by default)
This commit was SVN r18228.
2008-04-21 22:22:18 +00:00
Rich Graham
df35223603 add selection logic for barrier and reduce.
This commit was SVN r18215.
2008-04-19 22:40:04 +00:00
Rich Graham
bee8b42f29 remove debug code that would not let people run.
Add infrastructure for blocking-barrier.

This commit was SVN r18214.
2008-04-19 01:34:04 +00:00
Galen Shipman
92e3b8671f nasty memory bug...
This commit was SVN r18207.
2008-04-18 03:01:53 +00:00
Ralph Castain
fa082cafa9 Shift the architecture calculation from the ompi/datatype engine to the opal/util area. This allows us to compute the architecture earlier in the launch and communicate it outside of the modex.
Note: this is an early preliminary step in the movement of portions of the datatype engine to the opal layer.

This commit was SVN r18198.
2008-04-17 20:43:56 +00:00
Tim Prins
eb94fa48ce the port name is only relevant at the root, so only look at it there.
This commit was SVN r18188.
2008-04-17 12:37:10 +00:00
Tim Prins
3582e11200 cleanup some warnings on 32 bit systems
This commit was SVN r18187.
2008-04-17 12:25:05 +00:00
Rich Graham
6c77fa4921 add a blocking shared memory algorithm.
This commit was SVN r18185.
2008-04-16 22:10:23 +00:00
Ralph Castain
7b91f8baff Cleanup and fix bugs in the MPI dynamics section. Modify the dpm API so it properly takes ports instead of process names (as correctly identified by Aurelien). Fix race conditions in the use of ompi-server. Fix incompatibilities between the mpi bindings and the dpm implemenation that could cause segfaults due to uninitialized memory.
Fix the ompi-server -h cmd line option so it actually tells you something!

Add two new testing codes to the orte/test/mpi area: accept and connect.

This commit was SVN r18176.
2008-04-16 14:27:42 +00:00
Shiqing Fan
1c4c7e0f2f Add memchecker support for osc rdma communication.
This commit was SVN r18173.
2008-04-16 13:29:55 +00:00
Shiqing Fan
79da2fdd2c Use the new memchecker convertor function.
Remove some unnecessary memchecker calls.

This commit was SVN r18172.
2008-04-16 13:24:35 +00:00
Adrian Knoth
d34dfbe12c fixed misleading comment.
This commit was SVN r18170.
2008-04-16 11:26:15 +00:00
Adrian Knoth
20473bfda2 on incoming connections, compare with every possible source address.
Rational (taken from the code):

    /* This is PITA. We never know which source address an 
    * incoming/outgoing packet will have, so even with 
    * btl_tcp_if_include/exclude on the remote end, we 
    * might get a different source address. 
    * 
    * If this address isn't included in btl_proc->proc_addrs, 
    * we would erroneously drop the connection 
    */ 

merge -r18165:18167 to the trunk.

This commit was SVN r18169.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r18165
  r18167
2008-04-16 11:24:09 +00:00
Adrian Knoth
e981a259bb btl_tcp_disable_family=4 and btl_tcp_disable_family=6 are mutually
exclusive, so this should result in "unreachable" when set differently
between peers.

This commit was SVN r18168.
2008-04-16 10:14:58 +00:00
Adrian Knoth
75c54616c7 renamed opal_sockaddr2str to opal_net_get_hostname for WANT_PEER_DUMP=1
This commit was SVN r18154.
2008-04-15 19:23:47 +00:00
Jeff Squyres
72af302360 Remove unused variable.
This commit was SVN r18151.
2008-04-15 14:58:32 +00:00
Aurelien Bouteiller
0f311ed824 Make sure the function returns NULL when no elan adapter is available instead of a random value.
This commit was SVN r18136.
2008-04-11 21:03:01 +00:00
Aurelien Bouteiller
20592cbcbf Fixes a warning about mallocing 0 bytes when no elan adapter is available.
This commit was SVN r18135.
2008-04-11 20:59:12 +00:00
Rich Graham
249445d61f added reduce-scatter followed by gather to root.
This commit was SVN r18133.
2008-04-11 13:49:08 +00:00
Rich Graham
a6bdbfab97 implement allreduce as reduce-scatter, followed by an allgather.
This commit was SVN r18132.
2008-04-11 04:06:29 +00:00
Jon Mason
08ead87604 Potential double free of locks
mca_btl_openib_endpoint_post_rr_nolock is freeing the endpoint lock on
the error case, but most/all of the functions calling this free the lock
regardless of its error case.  Thus resulting is a double free of the
lock.

This commit was SVN r18131.
2008-04-10 21:15:01 +00:00
Rich Graham
70f3aab5f2 remove some code that is not needed.
This commit was SVN r18128.
2008-04-10 17:32:04 +00:00
Rich Graham
5c7db1e315 remove 2 race conditions in the buffer recycling logic.
This commit was SVN r18127.
2008-04-10 17:20:52 +00:00
Edgar Gabriel
4964434205 reverting commit 18122, since the commit was executed accidentally in the
wring directory. The UH copyrights do belong into this file (i.e. because of
the fix which is in the 1.2 branch, the UH copyright notes are in the header
there alreary), but I want to have the proper log for that.  

This commit was SVN r18124.
2008-04-10 15:09:31 +00:00
Edgar Gabriel
f87830767a the verification of recvcount==0 and rank = root was braking
inter-communicator scatter, since the root (root==MPI_ROOT) might very well
have recvcount=0. The same fix has been applied to gather.c just the other way
round. 
 
Fixes the bug reported on the mainling list by Martin Audet. If there is a
1.2.7 this fix might be worthwhile porting it over.

Please note, that while the test works now for basic and for inter, we get a
0byte malloc warning from the inter module, which we still have to fix in a
separate patch.

This commit was SVN r18122.
2008-04-10 14:58:51 +00:00
Ralph Castain
3a0d09300b Fully implement the inbound binomial allgather for daemon-based collectives. Supports both modex and barrier operations.
Comm_spawn still uses the rank=0 method - shifting that algo to the daemons is under study.

This commit was SVN r18115.
2008-04-09 22:10:53 +00:00
Rich Graham
c6783549ef getting old
This commit was SVN r18110.
2008-04-09 16:55:16 +00:00
Rich Graham
1a20c3ce51 more debug.
This commit was SVN r18109.
2008-04-09 16:19:52 +00:00
Rich Graham
e7e18303f6 more debug.
This commit was SVN r18108.
2008-04-09 15:10:58 +00:00
Rich Graham
b14c6b17d5 adding debug output.
This commit was SVN r18107.
2008-04-09 13:32:01 +00:00
Rich Graham
10434fb2f1 add barrier synchorinzation at the end of the module init, to
avoid initializing shared memory variables in use.

This commit was SVN r18105.
2008-04-09 03:44:40 +00:00
Rich Graham
19bb1a2e86 fix initialization bug.
This commit was SVN r18104.
2008-04-08 23:34:06 +00:00
Donald Kerr
38e298cc9a report error message in all libs, not just debug
This commit was SVN r18103.
2008-04-08 22:58:28 +00:00
Rich Graham
a69a8d9626 initialize the flags.
This commit was SVN r18102.
2008-04-08 22:16:39 +00:00
Rich Graham
8765a2bbdd more debug code.
This commit was SVN r18101.
2008-04-08 20:38:20 +00:00
Rich Graham
08becf33b5 add more debugging.
This commit was SVN r18100.
2008-04-08 18:44:50 +00:00
Rich Graham
aa1b7dd406 more debug
This commit was SVN r18099.
2008-04-08 03:56:47 +00:00
Rich Graham
0c18bdeff7 more debug code.
This commit was SVN r18098.
2008-04-08 03:04:20 +00:00
Rich Graham
9d5a7238df Add some debugging code.
This commit was SVN r18097.
2008-04-07 23:20:15 +00:00
Rich Graham
fa696734d5 add some debug code.
This commit was SVN r18096.
2008-04-07 21:03:23 +00:00
Shiqing Fan
28746bbcdb Remove the memchecker macro in pml base request, used in req_wait.c, which actually is in the wrong place. Instead, one simple call from send_request_free and recv_request_free(already done) will do all the work, fast and clean.
This commit was SVN r18095.
2008-04-07 17:46:50 +00:00
Shiqing Fan
a1e5df1cc9 Use the new memchecker function call which is based on convertor.
Remove one unnecessary call.

This commit was SVN r18085.
2008-04-07 07:52:04 +00:00
Gleb Natapov
713a27dc71 Counter of created RDMA channels should be incremented immediately after channel
creation (not in control message completion) otherwise more than max_eager_rdma
channel may be created.

This commit was SVN r18082.
2008-04-06 13:48:45 +00:00
Rich Graham
1b54e8b76e fix buffer management for nb-barrier.
This commit was SVN r18081.
2008-04-05 21:59:04 +00:00
Tim Prins
313edd8955 - Fix a problem reported on the users list where we would segfault in finalize after calling spawn if the user did not call MPI_Comm_disconnect
- Fix the app context constructor so it initializes all the fields.

This commit was SVN r18079.
2008-04-04 15:07:39 +00:00
Jeff Squyres
7072a32703 * Properly protect XRC stuff
* A few minor style fixes

This commit was SVN r18076.
2008-04-02 19:52:03 +00:00
Rich Graham
94f8fd365c a few reduction optimizations. Add bcast.
This commit was SVN r18075.
2008-04-02 19:02:33 +00:00
George Bosilca
a00ca20446 More cleanups.
This commit was SVN r18069.
2008-04-02 06:38:33 +00:00
George Bosilca
944453c4c1 Cleanups.
This commit was SVN r18068.
2008-04-02 06:37:42 +00:00
Rich Graham
eb5d6096f1 add reduction routine - fix buffer recycling logic which was totally
broken.

This commit was SVN r18065.
2008-04-01 22:56:18 +00:00
Jeff Squyres
d944d5ec52 Just in case something goes drastically wrong, don't segv.
This commit was SVN r18049.
2008-03-31 21:55:07 +00:00
George Bosilca
b4f828f389 We need a newline at the nd of the file, or some compiler bark.
This commit was SVN r18023.
2008-03-30 19:05:56 +00:00
Gleb Natapov
b42234461a Cleanup shared file creation on unix/linux.
This commit was SVN r18021.
2008-03-30 13:41:47 +00:00
Jeff Squyres
d0f12f3df0 Make a better error message.
This commit was SVN r18014.
2008-03-29 12:54:24 +00:00
Rich Graham
90e53ca9ee debug the pipeline algorithm.
This commit was SVN r18008.
2008-03-28 15:10:07 +00:00
Aurelien Bouteiller
77653ac787 Missing .h file in makefile breaked nightly tarball distcheck...
This commit was SVN r18006.
2008-03-28 14:36:56 +00:00
Aurelien Bouteiller
c16339944a Fix a coverity warning about using unsafe sprintf.
This commit was SVN r17999.
2008-03-27 21:24:27 +00:00
Aurelien Bouteiller
e11237aadb Introduction of the "progress" sender_based method to replace the slow isend-self method.
This commit was SVN r17998.
2008-03-27 21:19:45 +00:00
Aurelien Bouteiller
93db01871e This is part of the previous patch.
This commit was SVN r17997.
2008-03-27 21:06:14 +00:00
Aurelien Bouteiller
f8bf6f2c6a Code cleanup.
sender_based.h is now split in two files, to solve cyclic .h files inclusion. 
Most macros are now inline functions.
Variable names have been changed from places to places.
Various other small things... 

This commit was SVN r17996.
2008-03-27 21:05:44 +00:00
George Bosilca
be4b153f0d Another patch for thread safety in the TCP BTL (thanks to Pierre).
This commit was SVN r17993.
2008-03-27 18:36:08 +00:00
Gleb Natapov
cf40674369 Decide if sends should be throttled at the receiver and pass this to the sender
in an ACK message. The decision can't be done reliably at the sender.

This commit was SVN r17987.
2008-03-27 08:56:43 +00:00
Rich Graham
e2ad9c4be2 adjust to change in orte_process_info.
This commit was SVN r17986.
2008-03-27 01:25:28 +00:00
Rich Graham
441fb9fb9e checkpoint.
This commit was SVN r17985.
2008-03-27 01:16:32 +00:00
Ralph Castain
90107f3c14 Fix an issue with comm_spawn over who sent/recv first in the modex. The modex assumes that the first name on the list is the "root" that will serve as the allgather collector/distributor. The dpm was putting that entity last, which forced us to pre-inform the parent procs of the child proc's contact info since the parent was trying to send to the child.
Clarify the setting of send_first in the mpi bindings (trivial, i know, but helpful)

Remove the extra xcast of child contact info to the parent job.

This commit was SVN r17952.
2008-03-25 14:57:34 +00:00
Ralph Castain
cca449e379 Move an OMPI RML tag to the OMPI layer
This commit was SVN r17950.
2008-03-25 13:30:48 +00:00
Jeff Squyres
5320c91ab3 Oops -- fix the constructor to also use opal_object_t instead of
opal_list_item_t.

This commit was SVN r17945.
2008-03-25 11:59:50 +00:00
Galen Shipman
0116041133 BTL shouldn't own the passive side's descriptor in the PML get protocol. The BTL
doesn't know when to free it on the passive side. 

This commit was SVN r17943.
2008-03-25 01:43:41 +00:00
Jeff Squyres
ebfdd133f5 AFACT, we never put endpoints on a list.
This commit was SVN r17940.
2008-03-24 18:32:55 +00:00
Ralph Castain
dc7f45dafd Remove the obsolete and largely unused orte_system_info structure. The only fields that were used in that struct were nodeid and nodename - these have been transferred to the orte_process_info structure.
Only one place used the user name field - session_dir, when formulating the name of the top-level directory. Accordingly, the code for getting the user's id has been moved to the session_dir code.

This commit was SVN r17926.
2008-03-23 23:10:15 +00:00
Rich Graham
a7c836a2b0 fix location of the restrict key word.
Make the tag in the fan-in/fan-out algorithm be fragment based.

This commit was SVN r17903.
2008-03-21 01:40:36 +00:00
Rich Graham
2c66d396b7 take care of some bit-rot with the fanin-fanout method.
This commit was SVN r17902.
2008-03-21 01:08:49 +00:00
Rich Graham
b9520e61dc get the sm optimized allreduce working for all but user defined
operations.  Added to the reduction operations a set of reduction
functions that take 2 input buffers and one output buffer to avoid
some extra memory copies.  These can't be used with user defined
operations.  The intel c collective suite passes both original, and
new (new, not the user defined operations).

This commit was SVN r17901.
2008-03-20 23:51:16 +00:00
Galen Shipman
dcac824f59 Fix problem in releasing fragments during GET_END event (didn't check that
portals btl has ownership and therefor didn't free the frag as it should) this
causes leakage and hangs in MPI_Finalize. 

Also added a bit more debugging. 

This commit was SVN r17900.
2008-03-20 22:46:32 +00:00
George Bosilca
efa89bfa3f Revert r17857. The context should be set in one case ... when we call prepare_{src|dst}
without calling a get or put. So, just keep it here until a better solution is
found.

This commit was SVN r17872.

The following SVN revision numbers were found above:
  r17857 --> open-mpi/ompi@d460ccfbf9
2008-03-18 19:01:27 +00:00
George Bosilca
8943ae0b4e Cleanup plus some typos.
This commit was SVN r17858.
2008-03-18 03:03:33 +00:00
George Bosilca
d460ccfbf9 No need to check for NULL there. The bml_btl is set correctly
on the upper level.

This commit was SVN r17857.
2008-03-18 03:02:31 +00:00
George Bosilca
39353ebb44 Cleanup.
This commit was SVN r17855.
2008-03-18 02:56:50 +00:00
George Bosilca
76deec135e The .h file is not used anymore (it contain the descriptor cache). Update the
Makefile.am file as well.

This commit was SVN r17854.
2008-03-18 02:50:24 +00:00
George Bosilca
1d04ec4ded Correct the connection logic for TCP. Now we have not only a cleaner
connection, but a more thread safe one. Thanks to Pierre for his
help on this.

This commit was SVN r17853.
2008-03-18 02:42:16 +00:00
Jeff Squyres
61290c0e51 Remove a useless file.
This commit was SVN r17852.
2008-03-18 01:50:47 +00:00
Ralph Castain
be7d0a8a4d Fix a problem introduced by the conversion of orte_pointer_array to opal_pointer_array. We used to derive the app context's index from the returned index of the orte_pointer_array_add function - this parameter was lost in the transition to opal_pointer_array_add. As a result, we no longer knew the index of the app_context, so everything is launched with app0.
This commit was SVN r17851.
2008-03-17 23:48:10 +00:00
Edgar Gabriel
570bbea5e0 fixing the allgather problem reported on the mailing list. The problem was
that at one locatin we had the local-size instead of the remote size as a
receive argument.

This commit was SVN r17849.
2008-03-17 19:42:18 +00:00
Gleb Natapov
9b6db25182 Fix compilation warning.
This commit was SVN r17839.
2008-03-17 13:37:57 +00:00
Pavel Shamis
54ad8d7446 The issue was reported/fixed by Jon Mason one month ago but the fix was not committed. So I'm commiting it now.
This commit was SVN r17835.
2008-03-17 11:13:06 +00:00
Brad Penoff
be13b86fc5 Clarifying and fixing SCTP btl_sctp_if_11 parameter
This commit was SVN r17834.
2008-03-17 09:18:31 +00:00
Gleb Natapov
f488b94899 More SM BTL initialization cleanups.
This commit was SVN r17833.
2008-03-16 10:01:56 +00:00
Rich Graham
27182afb67 get the timers in correctly.
This commit was SVN r17832.
2008-03-16 03:25:16 +00:00
Rich Graham
afcd1016fd move temp buffer allocation out of the iteration loop - i.e. always use the
same temp loop.  The algorithm is rather synchronous already...

This commit was SVN r17831.
2008-03-16 03:20:46 +00:00
Rich Graham
a1766b29f6 fix some barrier addressing errors.
This commit was SVN r17830.
2008-03-15 22:46:19 +00:00
Rich Graham
0453e7d2f4 bug in management memory allocation - too much memory allocated.
This commit was SVN r17829.
2008-03-15 18:12:20 +00:00
Rich Graham
3c2f1eb8bf reduce the number of temp buffers used.
This commit was SVN r17828.
2008-03-15 17:23:04 +00:00
Rich Graham
0f9d642d51 temp buffer pointers are computed when they are set up. A bit more
efficient, but more important, it is much easier to play around with
memory layout now.

This commit was SVN r17827.
2008-03-15 16:36:35 +00:00
Rich Graham
e3e336b5ab check point
This commit was SVN r17826.
2008-03-15 13:31:21 +00:00
Jeff Squyres
6c77c995c2 Add missing dependencies in the static build case.
This commit was SVN r17825.
2008-03-15 12:11:36 +00:00
George Bosilca
5e229fe688 Thanks Ma for the patch. Correct the multi-rail support and
rename some fields to something more clear.

This commit was SVN r17824.
2008-03-14 19:17:28 +00:00
George Bosilca
ecebd5ae77 Update the Elan BTL to take in account multiple networks, and correctly deal
with the node position in the network.

This commit was SVN r17822.
2008-03-14 17:32:35 +00:00
Gleb Natapov
772772b944 Remove unneeded include.
This commit was SVN r17813.
2008-03-12 10:01:20 +00:00
Gleb Natapov
90c70e37b9 Clean up SM btl startup code. Remove no longer needed code leftovers from two
BTL times. Remove old and no longer correct comment.

This commit was SVN r17805.
2008-03-11 14:39:10 +00:00
Gleb Natapov
3a9652ffc4 Endpoint array may not exist if in add_proc() we failed to find suitable
btl for communication with a proc. Don't segfault in this case.

This commit was SVN r17804.
2008-03-11 08:13:37 +00:00
Gleb Natapov
ffa09c44fd Pass correct pointer to mpool_base function.
This commit was SVN r17795.
2008-03-09 13:22:12 +00:00
Gleb Natapov
b0b21c68b4 Remove trailing spaces from SM BTL.
This commit was SVN r17794.
2008-03-09 13:17:13 +00:00
Rich Graham
ebcf928c24 add some diagnostics.
This commit was SVN r17789.
2008-03-07 22:27:41 +00:00
Rich Graham
9131461511 move some test code to another machine.
This commit was SVN r17785.
2008-03-07 19:18:02 +00:00
Rich Graham
c230b65543 fix a couple of bugs. Recursive doubling seems to be working.
This commit was SVN r17777.
2008-03-07 02:51:38 +00:00
Rich Graham
70157166f9 checkpoint - compiles, now neeed to debug.
This commit was SVN r17775.
2008-03-07 00:39:59 +00:00
Ralph Castain
b110a247be Fix comm_spawn (maybe).
Comm_spawn was sticking during spawn_multiple because of a problem in the dpm - the modex there is asking processes to talk to each other in an allgather_list operation, but the procs don't have the required contact info to do so. The solution here was to ensure that all parent procs have full contact info for procs in the child job.

Admittedly, this isn't the long-term answer. We would like to have the contact info given to only the parent procs that were involved in the comm_spawn. There is a way to do that, but this will suffice to keep things working until that can be implemented and tested.

This commit was SVN r17772.
2008-03-06 21:56:00 +00:00
Rich Graham
4eace9d020 starting to implement recursive doubling algorithm.
This commit was SVN r17765.
2008-03-06 18:38:58 +00:00
Tim Prins
5de3e1965e Remove the orte_proc_table. Migrate all users of it to the opal_hash_table and a new name hash function in orte.
Everything should work, however I am unable to compile and test the sctp BTL.

This commit was SVN r17751.
2008-03-05 22:44:35 +00:00
Tim Prins
f9916811ae Make it so we do not mangle the options the user passes to their executeable. Fixes trac:1124
The change also:
 - cleans up and simplifies the command line processing code
 - adds an error output if more than one hostfile passed for a single app context
 - gets rid of the superfluous orte_app_context_map_t type, and instead use a simple argv of -host options

This commit was SVN r17750.

The following Trac tickets were found above:
  Ticket 1124 --> https://svn.open-mpi.org/trac/ompi/ticket/1124
2008-03-05 22:12:27 +00:00
Donald Kerr
ef8f807c1c was not passing correct variable to dat_strerror
This commit was SVN r17749.
2008-03-05 21:45:16 +00:00
Josh Hursey
612ebdc2ac Cleanup some symbol visability issues.
This commit was SVN r17733.
2008-03-05 13:59:25 +00:00
Jeff Squyres
597266fdec Present state of MPI debugger work:
* New/improved bootstrapping technique for DLLs 
 * First cut of the MPI handle debugging interface. It is still
   evolving, but the interface is getting more stable.
 * Some minor bugs were fixed in the unity topo component (brought to
   light because of the new MPI handle debugging stuff).

Fixes trac:1209.

This commit was SVN r17730.

The following Trac tickets were found above:
  Ticket 1209 --> https://svn.open-mpi.org/trac/ompi/ticket/1209
2008-03-05 12:22:34 +00:00
Josh Hursey
3b4073e32c This commit fixes the checkpoint/restart functionality on the trunk. Included in this commit are:
* Extension to the ESS framework to support C/R
 * Fixed support for {{{snapc_base_establish_global_snapshot_dir}}}
 * Fixed FileM support
 * Misc. minor code modifications

There are some outstanding visability issues that I want to fix next.

This commit was SVN r17725.
2008-03-05 04:57:23 +00:00
Jeff Squyres
ea5c0cb4a2 Now that the nightly tarball has safely been made, let's try this
commit again.  Remove the svn:ignore from problematic directories and
try a merge from /tmp-public/plpa-merge-area2.

This commit was SVN r17718.
2008-03-05 02:45:15 +00:00
Galen Shipman
3a59cbd4a7 not sure how this got missed..
This commit was SVN r17710.
2008-03-05 01:23:43 +00:00
Christian Bell
987de57c9c Looks like orte/ns is now gone
This commit was SVN r17706.
2008-03-05 00:55:43 +00:00
Jeff Squyres
8189fcc7d5 Back out r17702; it went very badly.
This commit was SVN r17704.

The following SVN revision numbers were found above:
  r17702 --> open-mpi/ompi@3df754ebd7
2008-03-05 00:42:39 +00:00
Jeff Squyres
3df754ebd7 Bring over PLPA v1.1 from /tmp-public/plpa-v1.1 branch.
This commit was SVN r17702.
2008-03-05 00:16:49 +00:00
Christian Bell
c3d0a81cd3 Add new QLogic adapters to hca-params.init
This commit was SVN r17699.
2008-03-04 22:14:27 +00:00
Ralph Castain
55c727cea4 Fix compiler warning
This commit was SVN r17684.
2008-03-04 15:46:37 +00:00
Rich Graham
67ad9b6d6b increase max data segments size.
This commit was SVN r17677.
2008-03-02 19:11:09 +00:00
Gleb Natapov
08abafdaa1 Initialize ib_pd to NULL.
This commit was SVN r17674.
2008-03-02 09:11:23 +00:00
Rich Graham
53126fa7bd add calls to opal_progress()
This commit was SVN r17673.
2008-02-29 23:25:09 +00:00
Rich Graham
d37db14901 get the shared memory collectives working again with the new
version of orte.

This commit was SVN r17672.
2008-02-29 22:28:57 +00:00
Rich Graham
c253a7bda1 simplify the code abit.
This commit was SVN r17664.
2008-02-29 03:55:12 +00:00
Rich Graham
1632d8b299 revert to an older (not previosly checked in) version to get around a
regression.

This commit was SVN r17663.
2008-02-29 03:12:12 +00:00
Rich Graham
827e8d877e fix bug in node type, and some memory copy optimizations.
This commit was SVN r17661.
2008-02-29 01:20:11 +00:00
Rich Graham
940d6732c9 remove compiler warnings.
This commit was SVN r17656.
2008-02-28 22:01:19 +00:00
Tim Prins
84b2099fe8 Remove the now-unused orte_value_array. As this is the last 'class' split between orte and ompi, remove the big comment about the split in ompi_bitmap.
Also, update some properties (source files should not be executeable...), and remove a couple unneeded inclusions of orte_proc_table.h

This commit was SVN r17655.
2008-02-28 21:39:42 +00:00
Rich Graham
2b5fab9d51 avoid 0 byte malloc.
This commit was SVN r17653.
2008-02-28 21:11:42 +00:00
Rich Graham
4b26adef00 remove some debug output.
This commit was SVN r17650.
2008-02-28 20:54:35 +00:00
Ralph Castain
48e5840c50 Restore a placeholder to make non-SVN SCM's happy.
This commit was SVN r17648.
2008-02-28 20:19:22 +00:00
Rich Graham
5df6c6d043 fix several race conditions.
This commit was SVN r17645.
2008-02-28 19:40:19 +00:00
George Bosilca
9d421bea2a Replace all occurences of orte_pointer_array by opal_pointer_array. Remove the
implementation of orte_pointer_array.

This commit was SVN r17636.
2008-02-28 05:32:23 +00:00
George Bosilca
678e6c7f0d This is a Mercurial file.
This commit was SVN r17635.
2008-02-28 05:18:06 +00:00
Ralph Castain
d70e2e8c2b Merge the ORTE devel branch into the main trunk. Details of what this means will be circulated separately.
Remains to be tested to ensure everything came over cleanly, so please continue to withhold commits a little longer

This commit was SVN r17632.
2008-02-28 01:57:57 +00:00
Aurelien Bouteiller
76e6334a57 This change is a mistake. CONVERTOR METHOD does not work with unpatched trunk. Revert back to PACK_METHOD.
This commit was SVN r17629.
2008-02-27 20:02:25 +00:00
Aurelien Bouteiller
1d57b8b0e0 Replaced all the (long) cast by PRIsize_t. Should solve definitely compiler warnings that appeared from time to time depending on sizeof(size_t)...
This commit was SVN r17627.
2008-02-27 19:58:18 +00:00
Rich Graham
68aa691171 checkpoint work.
This commit was SVN r17620.
2008-02-27 14:56:36 +00:00
Galen Shipman
b378c8c12c return success.
This commit was SVN r17612.
2008-02-27 02:15:53 +00:00
Galen Shipman
44003a41f2 Update common_portals to allow using portals interconnect with a modex rather
than relying on cnos to get the nid/pid map. 

This commit was SVN r17588.
2008-02-25 19:17:21 +00:00
Rich Graham
b4bbb70bb7 got it all, but for the mem copies. Also, need to make sure volatile declarations are all inplace, as well as memory barriers.
This commit was SVN r17572.
2008-02-25 00:16:21 +00:00
Rich Graham
2d8c2420e8 checkpoint.
This commit was SVN r17571.
2008-02-24 20:54:16 +00:00
Rich Graham
771584bff5 generate reduction tree.
This commit was SVN r17569.
2008-02-24 03:25:40 +00:00
Brian Barrett
bc8d863ce3 * Make Portals BTL compile again (looks like the frag ownership stuff didn't
get copied well)
* Clean up a bunch of warnings

This commit was SVN r17562.
2008-02-23 01:45:36 +00:00
Donald Kerr
437e280829 removing a few superfluous casts when the base or super is available
This commit was SVN r17554.
2008-02-22 20:10:55 +00:00
Ralph Castain
b4ec81a9fd Fix the Panasas support in ROMIO so it builds without complaints. Required a patch from Brian, plus a few edits by me to remove warnings.
NOTE: the code provided by PANASAS includes a "switch" that they left incomplete - it doesn't cover all possibilities. Since the value being switched is an enum, this causes problems for the compiler. I added the missing values, but - since Panasas felt they could be ignored - had the switch generate an error if those cases ever occurred.

This commit was SVN r17543.
2008-02-21 20:35:34 +00:00
Donald Kerr
fe51084d8e fix compile warning by casting btl udapl module to base module before call to mca_btl_udapl_free
This commit was SVN r17541.
2008-02-21 16:19:06 +00:00
Pierre Lemarinier
2a99f89631 Modification of the mutex lock order to prevent races during connection stage.
This commit was SVN r17535.
2008-02-20 18:17:58 +00:00
Rich Graham
b9bb78484d a bit of omptimization.
This commit was SVN r17528.
2008-02-20 16:19:49 +00:00
Pavel Shamis
a0d12a9c92 Adding support for APM over different ports
This commit was SVN r17521.
2008-02-20 13:44:05 +00:00
Rich Graham
09afc36f5f correct addressing.
This commit was SVN r17519.
2008-02-20 01:12:43 +00:00
Rich Graham
b87b15580c fix memory allocation error. Initialize pointer.
This commit was SVN r17514.
2008-02-19 20:01:42 +00:00
Gleb Natapov
60c151608c Set flags inside fragment allocation function.
This commit was SVN r17508.
2008-02-19 12:26:45 +00:00
Nysal Jan
479f36adfc Fix a SEGV on ppc64. size_t is 8 bytes on a 64-bit build
This commit was SVN r17507.
2008-02-19 11:01:21 +00:00
Jeff Squyres
5bb1e5151f Suggestions/patches from Brian to make stuff better:
* Include all the stuff that is necessary for running autogen.sh in a
   distribution tarball.
 * Remove from config/Makefile.am's EXTRA_DIST that which is
   automatically included in the tarball in recent versions of
   Automake (i.e., all the m4 files that are acincluded).
 * Make ROMIO's configure script look for something that is actually
   included in the tarball.

Fixes trac:1025.

This commit was SVN r17505.

The following Trac tickets were found above:
  Ticket 1025 --> https://svn.open-mpi.org/trac/ompi/ticket/1025
2008-02-19 01:49:52 +00:00
Jeff Squyres
f22f62ef1f Fix typos.
This commit was SVN r17502.
2008-02-18 21:26:21 +00:00
Jeff Squyres
33a4aff18e Make openib btl a bit more resillient in the face of driver errors --
return OMPI_ERR_UNREACH if the port returns an invalid speed or
width.  OMPI_ERR_VALUE_OUT_OF_BOUNDS is reserved for when we exceed
the number of allowable BTLs.

This commit was SVN r17500.
2008-02-18 20:28:06 +00:00
George Bosilca
7a21d77b29 Remove some compilation warnings.
This commit was SVN r17498.
2008-02-18 18:55:32 +00:00
George Bosilca
fa31ec81d0 Add the ownership flags to the PML/BTL interface. The layer
owning the descriptor is responsible for releasing it once
the descriptor is not in use anymore.

This commit was SVN r17497.
2008-02-18 17:39:30 +00:00
Shiqing Fan
653857ddbe Wrong function name was copied here.
This commit was SVN r17486.
2008-02-17 19:47:47 +00:00
Gleb Natapov
354c5bc5e1 Don't call progress() from OB1 fragment scheduling functions. They don't serve
any purpose and case recursion calls to progress engine.

This commit was SVN r17478.
2008-02-17 12:42:32 +00:00
Rich Graham
1cd8a2e578 checkpoint - works for 2 procs, but not more.
This commit was SVN r17477.
2008-02-17 05:21:58 +00:00
Rich Graham
8006927ae8 free buffer, rather than ask for another one, when done with the memory.
This commit was SVN r17468.
2008-02-15 04:21:58 +00:00
Rich Graham
2277b47ab9 register mca_coll_sm2_allreduce_intra - function still does not do any
reduction operations.

This commit was SVN r17467.
2008-02-15 04:13:00 +00:00
Rich Graham
9b0687e6df add buffer allocation and deallocation calls to the allreduce routine, so
I can start debugging the memory management code.  The allreduce fucntion
 does nothing at this stage.

This commit was SVN r17466.
2008-02-15 03:59:14 +00:00
George Bosilca
be2579467a With the new ompi_free_list this is not needed anymore.
This commit was SVN r17465.
2008-02-15 03:22:16 +00:00
Rich Graham
41943dbd76 adding missing files.
This commit was SVN r17462.
2008-02-15 00:59:28 +00:00
Rich Graham
41f4b06b39 buffer allocate/release code is fully written, and compiles. Now need to debug.
This commit was SVN r17461.
2008-02-15 00:57:44 +00:00
Rich Graham
7cc58768cd checkpoint something that compiles
This commit was SVN r17460.
2008-02-15 00:33:14 +00:00
Rich Graham
292d930eea check point.
This commit was SVN r17457.
2008-02-14 20:00:26 +00:00
Donald Kerr
58bf7f5a1d add uintptr_t to prevent the possibility of a signed extension occuring
This commit was SVN r17456.
2008-02-14 19:16:34 +00:00
Aurelien Bouteiller
3ffe845187 Fixed warning.
This commit was SVN r17454.
2008-02-14 15:18:19 +00:00
Jeff Squyres
6420db7088 Add missing header file that caused compilation errors in the
rhc-step2b branch last night.

This commit was SVN r17453.
2008-02-14 14:10:27 +00:00
George Bosilca
255cd2186b Improve the performance of the MX BTL. Correct the fake PUT
protocol.

This commit was SVN r17452.
2008-02-14 04:38:55 +00:00
Adrian Knoth
f1648f08df Advanced address selection code from Thomas Peiselt. Re #1207, #1027
This commit was SVN r17450.
2008-02-13 21:53:00 +00:00
Sharon Melamed
5b2dab2439 Reverted commit # r17443
This commit was SVN r17446.

The following SVN revision numbers were found above:
  r17443 --> open-mpi/ompi@88ce5a2b73
2008-02-13 14:07:12 +00:00
Sharon Melamed
88ce5a2b73 Replaced PLPA to the latest PLPA (plpa-1.1a3r123)
This commit was SVN r17443.
2008-02-13 13:09:11 +00:00
Gleb Natapov
0a1fa2cb56 req_match_received is set inside MCA_PML_OB1_RECV_REQUEST_MATCHE().
This commit was SVN r17442.
2008-02-13 08:34:39 +00:00
Gleb Natapov
876f49f1a7 Remove unnecessary assignment. It is done later in the same function.
This commit was SVN r17441.
2008-02-13 08:28:25 +00:00
Jeff Squyres
17ede97ef8 Two fixes to revert some long-ago decisions that seemed like a good
idea at the time, but led to logistical difficulties in importing new
versions of ROMIO: 

* We are effectively eliminating the ROMIO file prefix rule hacks in
  the ROMIO component, which create symlinks from foo.c to
  io_romio_foo.c.  In reality, the file name conflict potential will
  be small.
* Additionally, we are effectively eliminating the ROMIO function
  prefix rule in the ROMIO component.  This is another place where
  there are generally problems with the merge up new versions of ROMIO
  and/or patches from the user community (for their own local builds).
  In reality, since other major MPI implementations provides the same
  exact symbols, it won't cause any practical problems for users.

In return, we make it ''much'' simpler to apply ROMIO patches to Open
MPI.  The problem right now is that any patch will have filenames such
as ad_panfs.c, but Open MPI will only have io_romio_ad_panfs.c, making
things extremely difficult for users.  I believe, for example, that
this would make it possible for LANL to have applied their patches
without too much hassle on either their part or our part.  It will
also make things easier for OMPI when we/they want to do the next
ROMIO upgrade (this was one of the sources of problems on each
upgrade).

This commit was SVN r17436.
2008-02-12 18:55:17 +00:00
Shiqing Fan
54c7b71cfd Use the correct way of including memchecker.h, which will work with '--with-devel-headers'.
This commit was SVN r17435.
2008-02-12 18:01:17 +00:00
Rainer Keller
7621800477 - Fix and add comments -- output full name for pd
- Protect argument in macro...

This commit was SVN r17434.
2008-02-12 16:59:59 +00:00
Jeff Squyres
6adc5015f9 This file was accidentally re-introduced in r17409.
This commit was SVN r17428.

The following SVN revision numbers were found above:
  r17409 --> open-mpi/ompi@98f70d6318
2008-02-12 13:07:44 +00:00
Shiqing Fan
f5792bbda5 merging the memchecker into trunk.
This commit was SVN r17424.
2008-02-12 08:46:27 +00:00
Gleb Natapov
cf801edfe5 Use carto topology framework to choose which HCAs to use.
This commit was SVN r17414.
2008-02-11 10:34:11 +00:00
George Bosilca
ee321748a6 The lost space.
This commit was SVN r17413.
2008-02-10 22:08:49 +00:00
George Bosilca
55179b833c Unexpected ... Removing unistd.h from datatype.h break the compilation
of the pml_base_bsend ... 

This commit was SVN r17412.
2008-02-10 21:49:19 +00:00
Tim Prins
b88a3f7a94 Update onesided components to fix the case (on 64 bit machines) where the total offset is greater than 2^31-1 bytes.
See: http://www.open-mpi.org/community/lists/users/2008/01/4880.php

This commit was SVN r17400.
2008-02-07 18:45:35 +00:00
Pavel Shamis
df787bbeab Fixing compilation issue on machines with ofed under 1.3.
Also finx in apm migration flow.

This commit was SVN r17383.
2008-02-06 13:54:58 +00:00
Pavel Shamis
3ba3f70624 Adding apm support for xrc.
This commit was SVN r17382.
2008-02-06 10:19:51 +00:00
Gleb Natapov
03c80bdfe3 Fix old libiverbs case.
This commit was SVN r17370.
2008-02-04 14:05:01 +00:00
Pavel Shamis
f0c478e7e0 XRC - replacing the new old API with new one.
This commit was SVN r17369.
2008-02-04 14:03:38 +00:00
Gleb Natapov
67f752dd50 Add compatibility function between old libibverbs and current libibverbs
way of detecting HCAs.

This commit was SVN r17365.
2008-02-03 15:16:24 +00:00
George Bosilca
3a6d2e3894 The latest and greatest Elan improvements.
This commit was SVN r17361.
2008-02-01 21:29:57 +00:00
Edgar Gabriel
77057a50a3 - adding the two-level hierarchy detection algorithm
- minor fix in the temporary collectives 
- removing the symmetric parameter, since it didn't really make sense.

This commit was SVN r17359.
2008-02-01 17:11:36 +00:00
Rich Graham
fda485ff9c backing file is allocated and deallocated.
This commit was SVN r17358.
2008-02-01 15:26:20 +00:00
Gleb Natapov
f73adf69c0 Fix compiler warnings on 32bit systems.
This commit was SVN r17346.
2008-01-31 09:05:25 +00:00
Adrian Knoth
8ae4a10b4c Reverted r17331, r17332. Still broken. I'm in a bad hurry. :-( Re #1206
This commit was SVN r17333.

The following SVN revision numbers were found above:
  r17331 --> open-mpi/ompi@3846e2a797
  r17332 --> open-mpi/ompi@c03de08c55
2008-01-30 16:51:55 +00:00
Adrian Knoth
c03de08c55 Logic is wrong. I'm going to revert it again. Re #1206
This commit was SVN r17332.
2008-01-30 16:48:50 +00:00
Adrian Knoth
3846e2a797 When checking incoming connections, also care about aliased interfaces.
Re #1206

This commit was SVN r17331.
2008-01-30 16:45:41 +00:00
Adrian Knoth
7f79c68930 Reverted r17307 and r17308. It broke parallel TCP connections. Re #1206
This commit was SVN r17329.

The following SVN revision numbers were found above:
  r17307 --> open-mpi/ompi@7a59b3f58c
  r17308 --> open-mpi/ompi@72b29bc21f
2008-01-30 14:31:47 +00:00
Aurelien Bouteiller
4da1258d60 Quick fix for static builds (mca_component_retain always return failure in static build mode, so just blatently ignore the failure. Though, this may crash severly sometime later if the failure occurs while in dso mode.
This commit was SVN r17328.
2008-01-30 10:41:49 +00:00
George Bosilca
4e703741b7 Move the PML tags into the legal range.
This commit was SVN r17326.
2008-01-30 00:09:45 +00:00
Adrian Knoth
72b29bc21f Cosmetic patch. Use IN6_ARE_ADDR_EQUAL instead of memcmp(). Re #1206.
This commit was SVN r17308.
2008-01-29 16:02:24 +00:00
Adrian Knoth
7a59b3f58c accept incoming connections from hosts with multiple addresses.
We loop over all peer addresses and accept when one of them matches.
Note that this might break functionality: mca_btl_tcp_proc_insert now
always inserts the same endpoint. (is the lack of endpoints the problem?
should there be one for every remote address?)

Re #1206

This commit was SVN r17307.
2008-01-29 15:55:56 +00:00
Pavel Shamis
7b59f8ae0b Fixing warning in apm code.
This commit was SVN r17306.
2008-01-29 15:45:18 +00:00
Gleb Natapov
bb03e07ec4 Move eager RDMA channels accounting into completion callback. Otherwise it
can go wrong with XRC as endpoint may be not yet connected at the time
eager rdma channel is created.

This commit was SVN r17302.
2008-01-29 14:35:33 +00:00
Pavel Shamis
92ef832472 Making sure that XRC will not overrun ib_dev_attr.max_qp_wr
This commit was SVN r17300.
2008-01-29 13:15:21 +00:00
Aurelien Bouteiller
2fd8230025 Windows might not be the only one...
This commit was SVN r17296.
2008-01-29 07:44:33 +00:00
Aurelien Bouteiller
bd10a0231f Replaced the explicit include of inttypes.h by the opal replacement.
This commit was SVN r17295.
2008-01-29 07:35:14 +00:00
Aurelien Bouteiller
e261861f4a Major build system modification. Removed symlinks (problem with make dist), solved issues with static builds and can accept most compile options. The only unsupported compile option for now is --enable-mca-no-build=pml-v. Still investigating this...
This commit was SVN r17294.
2008-01-29 06:07:57 +00:00
George Bosilca
fad6136794 To be or not to be ! As DR require 64 bits atomics, only allow it to
build when thread support is disabled or we have 64 bits atomics support.

This commit was SVN r17293.
2008-01-29 05:24:56 +00:00
Rich Graham
165fc3f8cc memory allocation implemented and debugged. Still need to finish
file allocation/dealocation and control information initialization.

This commit was SVN r17291.
2008-01-29 03:09:12 +00:00
Pavel Shamis
7d83f34eb0 Protecting the apm code with OMPI_HAVE_THREADS.
This commit was SVN r17284.
2008-01-28 16:10:18 +00:00
Jeff Squyres
6a49c97368 Remove erroneous #if
This commit was SVN r17282.
2008-01-28 14:38:03 +00:00
Pavel Shamis
28a3917306 Adding APM support (over different lids).
This commit was SVN r17280.
2008-01-28 10:38:08 +00:00
George Bosilca
c5d5fcf50a Protect the standard header file, and allow the PML V to compile
on Windows.

This commit was SVN r17250.
2008-01-26 18:43:06 +00:00
Aurelien Bouteiller
ca8eb1fb30 There should be no leftovers of configuration phase after distclean
This commit was SVN r17249.
2008-01-26 09:56:02 +00:00
Aurelien Bouteiller
b5d44261a0 Fix one warning about extremely long lines (due to macro expansion)
This commit was SVN r17247.
2008-01-26 00:38:33 +00:00
Aurelien Bouteiller
48cabdc40b Changed build system. Should be more distcheck, VPATH, static and other compilation mode friendly.
This commit was SVN r17245.
2008-01-25 23:57:01 +00:00
Rich Graham
e24c2ebbc0 have a working skeleton for the SM-V2 component. It does nothing at this stage.
This commit was SVN r17241.
2008-01-25 21:16:36 +00:00
Rich Graham
1d0334f4f2 skeleton for new shared memory collective component.
This commit was SVN r17235.
2008-01-25 19:35:26 +00:00
Rainer Keller
f7e586fc01 - allow --enable-mca-direct=pml-ob1
This commit was SVN r17227.
2008-01-25 09:56:45 +00:00
Rich Graham
432ba0cecd add comments about the life-cycle of a collective module.
This commit was SVN r17223.
2008-01-25 03:46:31 +00:00
George Bosilca
3418485085 Replace the tport by a queue.
This commit was SVN r17221.
2008-01-25 01:15:18 +00:00
Aurelien Bouteiller
e471abb55e put back ompi ignore until long filenames and other dist issues are fixed
This commit was SVN r17219.
2008-01-25 00:28:30 +00:00
Donald Kerr
66acac8ff3 the value for invalid idx was just plain wrong, a more appropriate value is now used
This commit was SVN r17201.
2008-01-24 15:01:26 +00:00
Jeff Squyres
2227d5ec4a Add configure check for struct ibv_device.transport type, which was added in OFED v1.2. Still need to fix up oob and rdma_cm cpc's to do something better with this information...
This commit was SVN r17198.
2008-01-24 12:14:21 +00:00
Aurelien Bouteiller
11815d9773 Fixed two warnings (especially the one that get repeted a large number of times in 64bit builds)
This commit was SVN r17197.
2008-01-24 04:59:31 +00:00
Aurelien Bouteiller
a9045402c4 remove a pedantic warning
This commit was SVN r17196.
2008-01-24 02:29:07 +00:00
Aurelien Bouteiller
76b13f91b9 fixed link:wq error in static mode
This commit was SVN r17194.
2008-01-23 23:54:02 +00:00
Aurelien Bouteiller
f29ed2ed53 fixed missing errno.h on some architectures
This commit was SVN r17186.
2008-01-23 20:24:54 +00:00
Aurelien Bouteiller
6fe17aff4a solve compatibility issue from MMAP_NOCACHE
This commit was SVN r17184.
2008-01-23 19:29:19 +00:00
Aurelien Bouteiller
69b3bae999 removed ignore, as the code is robust enough to avoid interfering with others
This commit was SVN r17182.
2008-01-23 17:27:23 +00:00
Gleb Natapov
6e4155d111 Initialize local variable before use.
This commit was SVN r17170.
2008-01-21 15:17:49 +00:00
Gleb Natapov
52c94fa7ea Fix compilation warnings.
This commit was SVN r17169.
2008-01-21 15:07:39 +00:00
Gleb Natapov
c9a1b06771 Remove trailing whitespaces. No code changes in this commit.
This commit was SVN r17167.
2008-01-21 12:11:18 +00:00
George Bosilca
31390c0074 We should take in account the extent of the datatype when we compute
the initial displacement in bytes. Thanks to Daniel G. Hyams for the fix.

This commit was SVN r17165.
2008-01-19 05:34:53 +00:00
George Bosilca
170416797d This commit was SVN r17162. 2008-01-18 20:10:57 +00:00
George Bosilca
0081202195 Mark the receives as ELAN_TPORT_RXBUF | ELAN_TPORT_RXANY ...
This commit was SVN r17161.
2008-01-18 20:00:44 +00:00
George Bosilca
bf299bb833 Keep most of the functions as static. Improve the progress function. Get rid
of all internal quues that are not really useful.

This commit was SVN r17160.
2008-01-18 19:28:50 +00:00
Donald Kerr
5f884b1ca4 fix for #1130 - adds support for multi-rail configurations
This commit was SVN r17152.
2008-01-17 17:30:50 +00:00
Donald Kerr
908b514ac5 update use of internal tag values to accommodate the active message change found in r17140
This commit was SVN r17148.

The following SVN revision numbers were found above:
  r17140 --> open-mpi/ompi@6310ce955c
2008-01-16 21:17:25 +00:00
Pavel Shamis
add4d9df8a XRC fixes for MPI2 dynamics.
This commit was SVN r17144.
2008-01-15 21:14:48 +00:00
Jeff Squyres
251842ff6a Remove this AS_IF -- it breaks "make dist".
This commit was SVN r17143.
2008-01-15 12:33:08 +00:00
George Bosilca
e8ac5ff04d Typos.
This commit was SVN r17141.
2008-01-15 05:37:42 +00:00
George Bosilca
6310ce955c The first patch related to the Active Message stuff. So far, here is what we have:
- the registration array is now global instead of one by BTL.
- each framework have to declare the entries in the registration array reserved. Then
  it have to define the internal way of sharing (or not) these entries between all
  components. As an example, the PML will not share as there is only one active PML
  at any moment, while the BTLs will have to. The tag is 8 bits long, the first 3
  are reserved for the framework while the remaining 5 are use internally by each
  framework.
- The registration function is optional. If a BTL do not provide such function,
  nothing happens. However, in the case where such function is provided in the BTL
  structure, it will be called by the BML, when a tag is registered.

Now, it's time for the second step... Converting OB1 from a switch based PML to an
active message one.

This commit was SVN r17140.
2008-01-15 05:32:53 +00:00
George Bosilca
98f79f2ea0 Remove the second declaration of the PML V component.
This commit was SVN r17139.
2008-01-15 05:26:26 +00:00
Jon Mason
a0d4122606 The new cpc selection framework is now in place. The patch below allows
for dynamic selection of cpc methods based on what is available.  It
also allows for inclusion/exclusions of methods.  It even futher allows
for modifying the priorities of certain cpc methods to better determine
the optimal cpc method.

This patch also contains XRC compile time disablement (per Jeff's
patch).

At a high level, the cpc selections works by walking through each cpc
and allowing it to test to see if it is permissable to run on this
mpirun.  It returns a priority if it is permissable or a -1 if not.  All
of the cpc names and priorities are rolled into a string.  This string
is then encapsulated in a message and passed around all the ompi
processes.  Once received and unpacked, the list received is compared
to a local copy of the list.  The connection method is chosen by
comparing the lists passed around to all nodes via modex with the list
generated locally.  Any non-negative number is a potentially valid
connection method.  The method below of determining the optimal
connection method is to take the cross-section of the two lists.  The
highest single value (and the other side being non-negative) is selected
as the cpc method.

svn merge -r 16948:17128 https://svn.open-mpi.org/svn/ompi/tmp-public/openib-cpc/ .

This commit was SVN r17138.
2008-01-14 23:22:03 +00:00
Pavel Shamis
6e50fca2dd Fixing permissions for XRC domain file.
This commit was SVN r17127.
2008-01-13 19:23:11 +00:00
Jon Mason
626e0814a2 Style clean-up
This commit was SVN r17126.
2008-01-12 18:47:17 +00:00
Ron Brightwell
b02cad2a0b added optional rendezvous protocol for long messages
This commit was SVN r17124.
2008-01-11 22:12:45 +00:00
George Bosilca
3fca3973d3 The PTLs are now long gone !!!
This commit was SVN r17104.
2008-01-10 00:18:45 +00:00
Jon Mason
3970c3ff6c Add Chelsio T3 to ompi/mca/btl/openib/mca-btl-openib-hca-params.ini
This commit was SVN r17101.
2008-01-09 22:14:18 +00:00
Jon Mason
597c7e68f1 Minor cleanups
This commit was SVN r17100.
2008-01-09 21:54:11 +00:00
George Bosilca
1bd31aa3ac Cleanup the OMPI_DECLSPEC/OMPI_MODULE_DECLSPEC in the PMLs.
This commit was SVN r17093.
2008-01-09 20:32:39 +00:00
Rolf vandeVaart
870fa8b1f1 Pad the sm btl header to double-word alignment. Preserves PML
header as double-word aligned and prevents bus errors on SPARC
based servers.  This is part of fix for #1148.

Refs trac:1148

This commit was SVN r17090.

The following Trac tickets were found above:
  Ticket 1148 --> https://svn.open-mpi.org/trac/ompi/ticket/1148
2008-01-09 18:50:51 +00:00
Gleb Natapov
25ce70bb92 Call mca_btl_openib_endpoint_post_send() holding endpoint lock and not holding
qp lock since this is what the function assumes.

This commit was SVN r17086.
2008-01-09 14:46:41 +00:00