on 64 bit platforms sizeof(size_t) != sizeof(orte_std_cntr_t), and we were incorrectly
assuming this when dealing with num procs. It worked on little endian platforms, but
not big endian. So change num_procs to type int, and cast where needed.
This commit was SVN r11796.
the initialization in the start of the function is going to remain for
the life of the function is erroneous. Need to initialize before
using in the snprintf() call.
Refs trac:340
This commit was SVN r11781.
The following Trac tickets were found above:
Ticket 340 --> https://svn.open-mpi.org/trac/ompi/ticket/340
wider space than getpid()
* Include <time.h> to get time()'s prototype
* Fix typo that prevented using /dev/urandom on systems that had it
This commit was SVN r11780.
Makefile.am's is from a very old Automake bug which has long-since
been fixed. Since we require very recent versions of AM, we don't
need these anymore.
This commit was SVN r11774.
* Use $31 instead of mnemonic zero for the gcc inline
assembly test, as the GNU assembler doesn't like
zero, but both Tru64 and GNU assembler should be fine
with $31
* Disable Linux timer component on Alpha. The CPU timer
rolls over every 10 seconds or less, so it's kinda
worthless for our needs.
* Fix some escaping issues when local functions are
denoted with a $
* Remove C++ comments from the Alpha assembly.
* Add base assembly code for the non-inlined functions
on Alpha
This commit was SVN r11764.
CXXFLAGS are set to -Wall -Werror. Thanks to Ralf for the patch.
refs trac:290
This commit was SVN r11762.
The following Trac tickets were found above:
Ticket 290 --> https://svn.open-mpi.org/trac/ompi/ticket/290
Fix for double mutex free that would cause an abort condition in the orted
whenever threads were enabled.
This commit was SVN r11759.
The following Trac tickets were found above:
Ticket 391 --> https://svn.open-mpi.org/trac/ompi/ticket/391
should die, according to the MPI standard. It's possible that the
ORTE layer may kill additional processes, but that's beyond our
control and seems to be allowed by the standard (ie, it might also
end up killing all the procs in all the jobs covered by the
communicator).
* update the stack trace printing code to use the framework rather
than calling execinfo directly, so that we should be able to get
stack traces on all the platforms we support stack tracing on
(if the user wants stack traces on abort, of course)
This commit was SVN r11753.
tell if the remote proc should be in an exposure epoch or not.
Refs trac:325
This commit was SVN r11746.
The following Trac tickets were found above:
Ticket 325 --> https://svn.open-mpi.org/trac/ompi/ticket/325
epoch's control data could overwrite the previous epoch's data because
we were reusing data structures between PW and SC. Instead, we now
have explicit post_msg and complete_msg counters for completion.
refs trac:354
* Only register the rdma osc callback once, as it turns out that some
btls (MX) do somethng more than update a table during the register
call, and each register call sucks up valuable fragments...
This commit was SVN r11745.
The following Trac tickets were found above:
Ticket 354 --> https://svn.open-mpi.org/trac/ompi/ticket/354
LoadLeveler only sets LOADL_PROCESSOR_LIST when there are 128 or less tasks allocated to a job. The POE RAS relied on this variable so I created a new RAS which uses the LoadLeveler API instead of relying on the environment variable. This still needs some testing, so for now we use the POE RAS whenever LOADL_PROCESSOR_LIST, otherwise we fall back on this component.
Unfortunately, this will require an autogen...
This commit was SVN r11732.
long ago) supposed to be used as a cache for accessing the PML procs. But in
all of the PMLs the PML proc contain only one field i.e. a pointer to the ompi_proc.
This pointer can be accessed using the c_remote_group easily. Therefore, there is no
meaning of keeping the PML procs around. Slim fast commit ...
This commit was SVN r11730.
We were still waiting the entire duration of the timeout before we figured out that a connect() was successful. Re-introduce adding the peer_send_event so that we detect immediately when a connect() completes.
Also make sure to delete the timeout event in complete_connect().
Fixed a struct timeval initialization warning reported by Jeff.
Remove an erroneous opal_output().
This commit was SVN r11724.
The following SVN revision numbers were found above:
r11718 --> open-mpi/ompi@1b6231a9b5
Each 's' partition has its own TCP network. It's fine to use this network for jobs that fit inside the partition, but the TCP OOB errors when trying to connect across two partitions, because there are two disjoint networks. Each node also has another TCP network connecting ALL nodes together.
So the solution is to actually try all the available TCP interfaces on a node, instead of erroring when the first one fails.
Also, the default TCP connect() timeout is way too long (5 minutes) - use our own timeout mechanism, with the timeout value expressed as an MCA parameter.
This commit was SVN r11718.
if we want to be able to reuse the request. If not, the request will never be freed
even if the user call MPI_Request_free.
This commit was SVN r11717.