We currently save the hostname of a proc when we create the ompi_proc_t for it. This was originally done because the only method we had for discovering the host of a proc was to include that info in the modex, and we had to therefore store it somewhere proc-local. Obviously, this ccarried a memory penalty for storing all those strings, and so we added a "cutoff" parameter so that we wouldn't collect hostnames above a certain number of procs.
Unfortunately, this still results in an 8-byte/proc memory cost as we have a char* pointer in the opal_proc_t that is contained in the ompi_proc_t so that we can store the hostname of the other procs if we fall below the cutoff. At scale, this can consume a fair amount of memory.
With the switch to relying on PMIx, there is no longer a need to cache the proc hostnames. Using the "optional" feature of PMIx_Get, we restrict the retrieval to be purely proc-local - i.e., we retrieve the info either via shared memory or from within the proc-internal hash storage (depending upon the active PMIx components). Thus, the retrieval of a hostname is purely a local operation involving no communication.
All RM's are required to provide a complete hostname map of all procs at startup. Thus, we have full access to all hostnames without including them in a modex or having to cache them on each proc. This allows us to remove the char* pointer from the opal_proc_t, saving us 8-bytes/proc.
Unfortunately, PMIx_Get does not currently support the return of a static pointer to memory. Thus, even though PMIx has the hostname in its memory, it can only return a malloc'd version of it. I have therefore ensured that the return from opal_get_proc_hostname is consistently malloc'd and free'd wherever used. This shouldn't be a burden as the hostname is only used in one of two circumstances:
(a) in an error message
(b) in a verbose output for debugging purposes
Thus, there should be no performance penalty associated with the malloc/free requirement. PMIx will eventually be returning static pointers, and so we can eventually simplify this method and return a "const char*" - but as noted, this really isn't an issue even today.
Signed-off-by: Ralph Castain <rhc@pmix.org>
This reverts commit 6acebc40a194c92ab38a28553c2c8b04eb391820.
This patch is causing numerous "Socket closed" messages which are
causing most of the failures on Cisco's MTT run. See
https://github.com/open-mpi/ompi/issues/5849 for more information.
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
When an error is returned by the socket operations, trigger the
appropriate error path in the PML to give an opportunity for
rerouting/error handling.
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
Some OSes have hardcoded limits to prevent overflowing over an int32_t.
We can either detect this at configure (which might be a nicer but
incomplete solution), or always force the pipelined protocol over TCP.
As it only covers data larger than 1GB, no performance penalty is to be
expected.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
as the writev and readv support a sum larger than a uint32_t
this version will work. For the other OSes a different patch
is required. This patch is a slight modification of the one
proposed by @ggouaillardet.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Due to the conversion from ssize_t to int we were losing bytes, and
ended up writing outside the receiver buffer. Similarly on the send,
due to the conversion to a lesser type, we could missinterpret the
end of the fragment.
since pthreads are now mandatory, the MCA_BTL_TCP_SUPPORT_PROGRESS_THREAD
is always true and hence can be safely removed
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
We commonly see messages on the users list where a peer has hung up
because it has crashed. Instead of having just a BTL_ERROR message,
make this a real opal_show_help() message that tells the user that the
peer unexpectedly hung up, and they should look into *why* that peer
hung up.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Added mca parameter to turn progress thread on/off
Add a flag to check if we have btl progress thread.
Added macro for ob1 matching lock.
Update the AUTHORS file.
All BTL-only operations (basically all data movements
with the exception of the matching operation) can now
be handled for the TCP BTL by a progress thread.
WHAT: Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL
All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies. This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP. Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose. UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs. A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic.
This commit was SVN r32317.