George Bosilca
6e6c370917
Rollback r18274 as its legal to have a sequence number smaller than the
...
expected one. It doesn't necessarily means the message is duplicated,
it can simply signify the message is out of sequence and the counter
overflowed.
This commit was SVN r18323.
The following SVN revision numbers were found above:
r18274 --> open-mpi/ompi@73c9de3af9
2008-04-27 18:35:54 +00:00
Aurelien Bouteiller
c20b020ea6
Fix ticket #1275 . The pml v can now be correctly deactivated on the configure command line. Also fix a dist target under some unusual circumpstances.
...
This commit was SVN r18291.
2008-04-24 21:42:54 +00:00
Josh Hursey
2c736873bb
Fix a checkpoint/restart bug that causes a restarted application to occasionally throw a SIGSEGV or SIGPIPE due to invalid socket descriptors.
...
The problem was caused by a bad ordering between the restart of the ORTE level tcp connections (in the OOB - out-of-band communication) and the Open MPI level tcp connections (BTLs). Before this commit ORTE would shutdown and restart the OOB completely before the OMPI level restarted its tcp connections. What would happen is that a socket descriptor used by the OMPI level on checkpoint was assigned to the ORTE level on restart. But the OMPI level had no knowledge that the socket descriptor it was previously using has been recycled so it closed it on restart. This caused the ORTE level to break as the newly created socket descriptor was closed without its knowledge.
The fix is to have the OMPI level shutdown tcp connections, allow the ORTE level to restart, and then allow the OMPi level to restart its connections. This seems obvious, and I'm surprised that this bug has not cropped up sooner. I'm confident that this specific problem has been fixed with this commit.
Thanks to Eric Roman and Tamer El Sayed for their help in identifying this problem, and patience while I was fixing it.
* Add a new state {{{OPAL_CRS_RESTART_PRE}}}. This state identifies when we are on the down slope of the INC (finalize-like) which is useful when you want to close, but not reopen a component set for fear of interfering with a lower level.
* Use this new state in OMPI level coordination. Here we want to make sure to play well with both the OMPI/BTL/TCP and ORTE/OOB/TCP components.
* Update ft_event functions in PML and BML to handle the new restart state.
* Add an additional flag to the error output in OOB/TCP so we can see what the socket descriptor was on failure as this can be helpful in debugging.
This commit was SVN r18276.
2008-04-24 17:54:22 +00:00
George Bosilca
3ccac4f803
Oops ...
...
This commit was SVN r18275.
2008-04-24 15:54:52 +00:00
George Bosilca
73c9de3af9
Bark if we got a wrong sequence number. Here wrong means that the
...
seq number if smaller than what we expect.
This commit was SVN r18274.
2008-04-24 15:48:43 +00:00
Josh Hursey
cc83d41ad9
Merge in tmp/jjh-scratch
...
{{{
svn merge -r 18218:18240 https://svn.open-mpi.org/svn/ompi/tmp/jjh-scratch .
}}}
Contains:
* Primarily a fix for a user reported problem where a cached file descriptor is causing a SIGPIPE on restart.
* Cleanup some small memory leaks from using mca_base_param_env_var() - Thanks Jeff
* Cleanup ORTE FT tool compilation in non-FT builds - Thanks Tim P.
* Cleanup mpi interface with missplaced {{{OPAL_CR_ENTER_LIBRARY}}} - Thanks Terry
* Some other sundry cleanup items all dealing with C/R functionality in the trunk.
This commit was SVN r18241.
2008-04-23 00:17:12 +00:00
Ralph Castain
fa082cafa9
Shift the architecture calculation from the ompi/datatype engine to the opal/util area. This allows us to compute the architecture earlier in the launch and communicate it outside of the modex.
...
Note: this is an early preliminary step in the movement of portions of the datatype engine to the opal layer.
This commit was SVN r18198.
2008-04-17 20:43:56 +00:00
Tim Prins
3582e11200
cleanup some warnings on 32 bit systems
...
This commit was SVN r18187.
2008-04-17 12:25:05 +00:00
Ralph Castain
3a0d09300b
Fully implement the inbound binomial allgather for daemon-based collectives. Supports both modex and barrier operations.
...
Comm_spawn still uses the rank=0 method - shifting that algo to the daemons is under study.
This commit was SVN r18115.
2008-04-09 22:10:53 +00:00
Shiqing Fan
28746bbcdb
Remove the memchecker macro in pml base request, used in req_wait.c, which actually is in the wrong place. Instead, one simple call from send_request_free and recv_request_free(already done) will do all the work, fast and clean.
...
This commit was SVN r18095.
2008-04-07 17:46:50 +00:00
Shiqing Fan
a1e5df1cc9
Use the new memchecker function call which is based on convertor.
...
Remove one unnecessary call.
This commit was SVN r18085.
2008-04-07 07:52:04 +00:00
George Bosilca
b4f828f389
We need a newline at the nd of the file, or some compiler bark.
...
This commit was SVN r18023.
2008-03-30 19:05:56 +00:00
Aurelien Bouteiller
77653ac787
Missing .h file in makefile breaked nightly tarball distcheck...
...
This commit was SVN r18006.
2008-03-28 14:36:56 +00:00
Aurelien Bouteiller
c16339944a
Fix a coverity warning about using unsafe sprintf.
...
This commit was SVN r17999.
2008-03-27 21:24:27 +00:00
Aurelien Bouteiller
e11237aadb
Introduction of the "progress" sender_based method to replace the slow isend-self method.
...
This commit was SVN r17998.
2008-03-27 21:19:45 +00:00
Aurelien Bouteiller
93db01871e
This is part of the previous patch.
...
This commit was SVN r17997.
2008-03-27 21:06:14 +00:00
Aurelien Bouteiller
f8bf6f2c6a
Code cleanup.
...
sender_based.h is now split in two files, to solve cyclic .h files inclusion.
Most macros are now inline functions.
Variable names have been changed from places to places.
Various other small things...
This commit was SVN r17996.
2008-03-27 21:05:44 +00:00
Gleb Natapov
cf40674369
Decide if sends should be throttled at the receiver and pass this to the sender
...
in an ACK message. The decision can't be done reliably at the sender.
This commit was SVN r17987.
2008-03-27 08:56:43 +00:00
Galen Shipman
0116041133
BTL shouldn't own the passive side's descriptor in the PML get protocol. The BTL
...
doesn't know when to free it on the passive side.
This commit was SVN r17943.
2008-03-25 01:43:41 +00:00
George Bosilca
8943ae0b4e
Cleanup plus some typos.
...
This commit was SVN r17858.
2008-03-18 03:03:33 +00:00
Josh Hursey
612ebdc2ac
Cleanup some symbol visability issues.
...
This commit was SVN r17733.
2008-03-05 13:59:25 +00:00
Ralph Castain
d70e2e8c2b
Merge the ORTE devel branch into the main trunk. Details of what this means will be circulated separately.
...
Remains to be tested to ensure everything came over cleanly, so please continue to withhold commits a little longer
This commit was SVN r17632.
2008-02-28 01:57:57 +00:00
Aurelien Bouteiller
76e6334a57
This change is a mistake. CONVERTOR METHOD does not work with unpatched trunk. Revert back to PACK_METHOD.
...
This commit was SVN r17629.
2008-02-27 20:02:25 +00:00
Aurelien Bouteiller
1d57b8b0e0
Replaced all the (long) cast by PRIsize_t. Should solve definitely compiler warnings that appeared from time to time depending on sizeof(size_t)...
...
This commit was SVN r17627.
2008-02-27 19:58:18 +00:00
George Bosilca
fa31ec81d0
Add the ownership flags to the PML/BTL interface. The layer
...
owning the descriptor is responsible for releasing it once
the descriptor is not in use anymore.
This commit was SVN r17497.
2008-02-18 17:39:30 +00:00
Shiqing Fan
653857ddbe
Wrong function name was copied here.
...
This commit was SVN r17486.
2008-02-17 19:47:47 +00:00
Gleb Natapov
354c5bc5e1
Don't call progress() from OB1 fragment scheduling functions. They don't serve
...
any purpose and case recursion calls to progress engine.
This commit was SVN r17478.
2008-02-17 12:42:32 +00:00
Aurelien Bouteiller
3ffe845187
Fixed warning.
...
This commit was SVN r17454.
2008-02-14 15:18:19 +00:00
Gleb Natapov
0a1fa2cb56
req_match_received is set inside MCA_PML_OB1_RECV_REQUEST_MATCHE().
...
This commit was SVN r17442.
2008-02-13 08:34:39 +00:00
Gleb Natapov
876f49f1a7
Remove unnecessary assignment. It is done later in the same function.
...
This commit was SVN r17441.
2008-02-13 08:28:25 +00:00
Shiqing Fan
54c7b71cfd
Use the correct way of including memchecker.h, which will work with '--with-devel-headers'.
...
This commit was SVN r17435.
2008-02-12 18:01:17 +00:00
Rainer Keller
7621800477
- Fix and add comments -- output full name for pd
...
- Protect argument in macro...
This commit was SVN r17434.
2008-02-12 16:59:59 +00:00
Jeff Squyres
6adc5015f9
This file was accidentally re-introduced in r17409.
...
This commit was SVN r17428.
The following SVN revision numbers were found above:
r17409 --> open-mpi/ompi@98f70d6318
2008-02-12 13:07:44 +00:00
Shiqing Fan
f5792bbda5
merging the memchecker into trunk.
...
This commit was SVN r17424.
2008-02-12 08:46:27 +00:00
George Bosilca
55179b833c
Unexpected ... Removing unistd.h from datatype.h break the compilation
...
of the pml_base_bsend ...
This commit was SVN r17412.
2008-02-10 21:49:19 +00:00
Aurelien Bouteiller
4da1258d60
Quick fix for static builds (mca_component_retain always return failure in static build mode, so just blatently ignore the failure. Though, this may crash severly sometime later if the failure occurs while in dso mode.
...
This commit was SVN r17328.
2008-01-30 10:41:49 +00:00
George Bosilca
4e703741b7
Move the PML tags into the legal range.
...
This commit was SVN r17326.
2008-01-30 00:09:45 +00:00
Aurelien Bouteiller
2fd8230025
Windows might not be the only one...
...
This commit was SVN r17296.
2008-01-29 07:44:33 +00:00
Aurelien Bouteiller
bd10a0231f
Replaced the explicit include of inttypes.h by the opal replacement.
...
This commit was SVN r17295.
2008-01-29 07:35:14 +00:00
Aurelien Bouteiller
e261861f4a
Major build system modification. Removed symlinks (problem with make dist), solved issues with static builds and can accept most compile options. The only unsupported compile option for now is --enable-mca-no-build=pml-v. Still investigating this...
...
This commit was SVN r17294.
2008-01-29 06:07:57 +00:00
George Bosilca
fad6136794
To be or not to be ! As DR require 64 bits atomics, only allow it to
...
build when thread support is disabled or we have 64 bits atomics support.
This commit was SVN r17293.
2008-01-29 05:24:56 +00:00
George Bosilca
c5d5fcf50a
Protect the standard header file, and allow the PML V to compile
...
on Windows.
This commit was SVN r17250.
2008-01-26 18:43:06 +00:00
Aurelien Bouteiller
ca8eb1fb30
There should be no leftovers of configuration phase after distclean
...
This commit was SVN r17249.
2008-01-26 09:56:02 +00:00
Aurelien Bouteiller
b5d44261a0
Fix one warning about extremely long lines (due to macro expansion)
...
This commit was SVN r17247.
2008-01-26 00:38:33 +00:00
Aurelien Bouteiller
48cabdc40b
Changed build system. Should be more distcheck, VPATH, static and other compilation mode friendly.
...
This commit was SVN r17245.
2008-01-25 23:57:01 +00:00
Rainer Keller
f7e586fc01
- allow --enable-mca-direct=pml-ob1
...
This commit was SVN r17227.
2008-01-25 09:56:45 +00:00
Aurelien Bouteiller
e471abb55e
put back ompi ignore until long filenames and other dist issues are fixed
...
This commit was SVN r17219.
2008-01-25 00:28:30 +00:00
Aurelien Bouteiller
11815d9773
Fixed two warnings (especially the one that get repeted a large number of times in 64bit builds)
...
This commit was SVN r17197.
2008-01-24 04:59:31 +00:00
Aurelien Bouteiller
a9045402c4
remove a pedantic warning
...
This commit was SVN r17196.
2008-01-24 02:29:07 +00:00
Aurelien Bouteiller
76b13f91b9
fixed link:wq error in static mode
...
This commit was SVN r17194.
2008-01-23 23:54:02 +00:00