openmpi

Автор	SHA1	Сообщение	Дата
Rolf vandeVaart	3bb587937a	Just fix up some trailing spaces, tabs instead of spaces, missing periods on copyrights, extraneous spaces on blank lines. No actual code change. This commit was SVN r23739.	2010-09-10 21:01:52 +00:00
Josh Hursey	e12ca48cd9	A number of C/R enhancements per RFC below: http://www.open-mpi.org/community/lists/devel/2010/07/8240.php Documentation: http://osl.iu.edu/research/ft/ Major Changes: -------------- * Added C/R-enabled Debugging support. Enabled with the --enable-crdebug flag. See the following website for more information: http://osl.iu.edu/research/ft/crdebug/ * Added Stable Storage (SStore) framework for checkpoint storage * 'central' component does a direct to central storage save * 'stage' component stages checkpoints to central storage while the application continues execution. * 'stage' supports offline compression of checkpoints before moving (sstore_stage_compress) * 'stage' supports local caching of checkpoints to improve automatic recovery (sstore_stage_caching) * Added Compression (compress) framework to support * Add two new ErrMgr recovery policies * {{{crmig}}} C/R Process Migration * {{{autor}}} C/R Automatic Recovery * Added the {{{ompi-migrate}}} command line tool to support the {{{crmig}}} ErrMgr component * Added CR MPI Ext functions (enable them with {{{--enable-mpi-ext=cr}}} configure option) * {{{OMPI_CR_Checkpoint}}} (Fixes trac:2342) * {{{OMPI_CR_Restart}}} * {{{OMPI_CR_Migrate}}} (may need some more work for mapping rules) * {{{OMPI_CR_INC_register_callback}}} (Fixes trac:2192) * {{{OMPI_CR_Quiesce_start}}} * {{{OMPI_CR_Quiesce_checkpoint}}} * {{{OMPI_CR_Quiesce_end}}} * {{{OMPI_CR_self_register_checkpoint_callback}}} * {{{OMPI_CR_self_register_restart_callback}}} * {{{OMPI_CR_self_register_continue_callback}}} * The ErrMgr predicted_fault() interface has been changed to take an opal_list_t of ErrMgr defined types. This will allow us to better support a wider range of fault prediction services in the future. * Add a progress meter to: * FileM rsh (filem_rsh_process_meter) * SnapC full (snapc_full_progress_meter) * SStore stage (sstore_stage_progress_meter) * Added 2 new command line options to ompi-restart * --showme : Display the full command line that would have been exec'ed. * --mpirun_opts : Command line options to pass directly to mpirun. (Fixes trac:2413) * Deprecated some MCA params: * crs_base_snapshot_dir deprecated, use sstore_stage_local_snapshot_dir * snapc_base_global_snapshot_dir deprecated, use sstore_base_global_snapshot_dir * snapc_base_global_shared deprecated, use sstore_stage_global_is_shared * snapc_base_store_in_place deprecated, replaced with different components of SStore * snapc_base_global_snapshot_ref deprecated, use sstore_base_global_snapshot_ref * snapc_base_establish_global_snapshot_dir deprecated, never well supported * snapc_full_skip_filem deprecated, use sstore_stage_skip_filem Minor Changes: -------------- * Fixes trac:1924 : {{{ompi-restart}}} now recognizes path prefixed checkpoint handles and does the right thing. * Fixes trac:2097 : {{{ompi-info}}} should now report all available CRS components * Fixes trac:2161 : Manual checkpoint movement. A user can 'mv' a checkpoint directory from the original location to another and still restart from it. * Fixes trac:2208 : Honor various TMPDIR varaibles instead of forcing {{{/tmp}}} * Move {{{ompi_cr_continue_like_restart}}} to {{{orte_cr_continue_like_restart}}} to be more flexible in where this should be set. * opal_crs_base_metadata_write* functions have been moved to SStore to support a wider range of metadata handling functionality. * Cleanup the CRS framework and components to work with the SStore framework. * Cleanup the SnapC framework and components to work with the SStore framework (cleans up these code paths considerably). * Add 'quiesce' hook to CRCP for a future enhancement. * We now require a BLCR version that supports {{{cr_request_file()}}} or {{{cr_request_checkpoint()}}} in order to make the code more maintainable. Note that {{{cr_request_file}}} has been deprecated since 0.7.0, so we prefer to use {{{cr_request_checkpoint()}}}. * Add optional application level INC callbacks (registered through the CR MPI Ext interface). * Increase the {{{opal_cr_thread_sleep_wait}}} parameter to 1000 microseconds to make the C/R thread less aggressive. * {{{opal-restart}}} now looks for cache directories before falling back on stable storage when asked. * {{{opal-restart}}} also support local decompression before restarting * {{{orte-checkpoint}}} now uses the SStore framework to work with the metadata * {{{orte-restart}}} now uses the SStore framework to work with the metadata * Remove the {{{orte-restart}}} preload option. This was removed since the user only needs to select the 'stage' component in order to support this functionality. * Since the '-am' parameter is saved in the metadata, {{{ompi-restart}}} no longer hard codes {{{-am ft-enable-cr}}}. * Fix {{{hnp}}} ErrMgr so that if a previous component in the stack has 'fixed' the problem, then it should be skipped. * Make sure to decrement the number of 'num_local_procs' in the orted when one goes away. * odls now checks the SStore framework to see if it needs to load any checkpoint files before launching (to support 'stage'). This separates the SStore logic from the --preload-[binary\|files] options. * Add unique IDs to the named pipes established between the orted and the app in SnapC. This is to better support migration and automatic recovery activities. * Improve the checks for 'already checkpointing' error path. * A a recovery output timer, to show how long it takes to restart a job * Do a better job of cleaning up the old session directory on restart. * Add a local module to the autor and crmig ErrMgr components. These small modules prevent the 'orted' component from attempting a local recovery (Which does not work for MPI apps at the moment) * Add a fix for bounding the checkpointable region between MPI_Init and MPI_Finalize. This commit was SVN r23587. The following Trac tickets were found above: Ticket 1924 --> https://svn.open-mpi.org/trac/ompi/ticket/1924 Ticket 2097 --> https://svn.open-mpi.org/trac/ompi/ticket/2097 Ticket 2161 --> https://svn.open-mpi.org/trac/ompi/ticket/2161 Ticket 2192 --> https://svn.open-mpi.org/trac/ompi/ticket/2192 Ticket 2208 --> https://svn.open-mpi.org/trac/ompi/ticket/2208 Ticket 2342 --> https://svn.open-mpi.org/trac/ompi/ticket/2342 Ticket 2413 --> https://svn.open-mpi.org/trac/ompi/ticket/2413	2010-08-10 20:51:11 +00:00
Jeff Squyres	c8bb7537e7	Remove include/opal/sys/cache.h -- its only purpose in life was to #define CACHE_LINE_SIZE to 128. This name has a conflict on NetBSD, and it seems kinda odd to have a header file that ''only'' defines a single value. Also, we'll soon be raising hwloc to be a first-class item, so having this file around seemed kinda weird. Therefore, I replaced CACHE_LINE_SIZE with opal_cache_line_size, an int (in opal/runtime/opal_init.c and opal/runtime/opal.h) on the rationale that we can fill this in at runtime with hwloc info (trunk and v1.5/beyond, only). The only place we ''needed'' a compile-time CACHE_LINE_SIZE was in the BTL SM (for struct padding), so I made a new BTL_SM_ preprocessor macro with the old CACHE_LINE_SIZE value (128). That use isn't suitable for run-time hwloc information, anyway. This commit was SVN r23349.	2010-07-06 14:33:36 +00:00
Rolf vandeVaart	03b3e75f86	Add two arguments to the PML error callback function. This allows the BTL to specify a specific ompi_proc_t that had an error. Also add an optional descriptive string. Currently, arguments are not used but will be by future failover PML. Changes based on RFC. Reviewed by George Bosilca. This commit was SVN r23174.	2010-05-19 11:55:45 +00:00
Abhishek Kulkarni	afbe3e99c6	* Wrap all the direct error-code checks of the form (OMPI_ERR_* == ret) with (OMPI_ERR_* = OPAL_SOS_GET_ERR_CODE(ret)), since the return value could be a SOS-encoded error. The OPAL_SOS_GET_ERR_CODE() takes in a SOS error and returns back the native error code. * Since OPAL_SUCCESS is preserved by SOS, also change all calls of the form (OPAL_ERROR == ret) to (OPAL_SUCCESS != ret). We thus avoid having to decode 'ret' to get the native error code. This commit was SVN r23162.	2010-05-17 23:08:56 +00:00
George Bosilca	321213e779	Fix segmentation fault on heterogeneous architectures. Don't mess with the ompi_ptr_t by translating into void*. Instead keep it as an ompi_ptr_t all the way. Thanks to Timur Magomedov for helping to track down this issue and test the patch. cmr:v1.4 cmr:v1.5 This commit was SVN r23030.	2010-04-23 15:14:55 +00:00
Josh Hursey	e9b5162d79	Fix the configure logic for --with-ft so that it properly takes a comma separated list. Many of the OPAL_ENABLE_FT should be OPAL_ENABLE_FT_CR, so fix those. The OPAL Layer INC should call opal_output on restart so that it can refresh the string it prints to reflect the current pid/hostname which may have changed. This commit was SVN r22824.	2010-03-12 23:57:50 +00:00
Brian Barrett	07d49e982b	hdr_ctx is a uint16, so can have CIDs in range of 0 ... 2^16 - 1. I think someone (me?) must have done 2^(16 - 1) instead. Ooops. This commit was SVN r21869.	2009-08-22 05:21:01 +00:00
George Bosilca	51b2cfe40d	This header is required to compile the FT. This commit was SVN r21792.	2009-08-11 05:21:27 +00:00
George Bosilca	9c2b993589	Complete r21778 by adding the missing headers. This commit was SVN r21784. The following SVN revision numbers were found above: r21778 --> open-mpi/ompi@e4d52b16b5	2009-08-10 17:07:43 +00:00
Terry Dontje	e4d52b16b5	Add in eager limit checks in pmls. This commit was SVN r21778.	2009-08-10 12:46:20 +00:00
Rolf vandeVaart	c82e468ede	Undo revision r21767 - sorry folks This commit was SVN r21769. The following SVN revision numbers were found above: r21767 --> open-mpi/ompi@41f38110ff	2009-08-05 22:23:26 +00:00
Rolf vandeVaart	41f38110ff	HCA failover support in openib BTL This commit was SVN r21767.	2009-08-05 21:53:02 +00:00
Brian Barrett	736debcffc	Check during communicator creation that we didn't get assigned a CID we can't handle, so that the code aborts instead of hange. Refs trac:1904 This commit was SVN r21133. The following Trac tickets were found above: Ticket 1904 --> https://svn.open-mpi.org/trac/ompi/ticket/1904	2009-04-30 19:23:57 +00:00
Rainer Keller	221fb9dbca	... Delayed due to notifier commits earlier this day ... - Delete unnecessary header files using contrib/check_unnecessary_headers.sh after applying patches, that include headers, being "lost" due to inclusion in one of the now deleted headers... In total 817 files are touched. In ompi/mpi/c/ header files are moved up into the actual c-file, where necessary (these are the only additional #include), otherwise it is only deletions of #include (apart from the above additions required due to notifier...) - To get different MCAs (OpenIB, TM, ALPS), an earlier version was successfully compiled (yesterday) on: Linux locally using intel-11, gcc-4.3.2 and gcc-SVN + warnings enabled Smoky cluster (x86-64 running Linux) using PGI-8.0.2 + warnings enabled Lens cluster (x86-64 running Linux) using Pathscale-3.2 + warnings enabled This commit was SVN r21096.	2009-04-29 01:32:14 +00:00
George Bosilca	c148d33eb5	Play nicely with the reference count on the ompi_proc structure. This commit was SVN r20970.	2009-04-10 16:32:02 +00:00
George Bosilca	44ce610b8b	Add a comment to highlight the fact that this function reappend the FIN message to the pending list when the send fails. Therefore, any upper level function is not required to add it. Make sure we don't send the FIN twice. This commit was SVN r20952.	2009-04-07 16:48:58 +00:00
George Bosilca	ccb79b963f	This is the other half of the commit r20946 as I mess them up between two of my testing machines. The fix require both commits! This commit was SVN r20947. The following SVN revision numbers were found above: r20946 --> open-mpi/ompi@e2bb4c9b8f	2009-04-06 21:49:52 +00:00
George Bosilca	e2bb4c9b8f	Correct the handling of the pckt_pending list. The problem was that we returned the pck before coping the values out. With this change it seems to work at least on two architectures (even with the mpool size set back to 0). This commit was SVN r20946.	2009-04-06 21:45:08 +00:00
Rainer Keller	811f2bd9b4	- As discussed on RFC, move the ompi_bitmap to the opal layer. Add a check against a maximum (actually get rid of ifs internally to opal_bitmap.c) -- the functionality to set the current maximum size opal_bitmap_set_max_size() is currently only used in attribute.c to set the maximum OMPI_FORTRAN_HANDLE_MAX... Tested on linux/x86-64 with intel-tests with all_tests_no_perf_f run with 6 procs. Let's look into MTT as well... This commit was SVN r20708.	2009-03-03 22:25:13 +00:00
Rainer Keller	96e1b9b747	- Header orte/mca/rml/rml.h is not needed if no occurence of orte_rml or ORTE_RML. As the others compiles fine with -Wimplicit-function-declaration This commit was SVN r20639.	2009-02-26 03:52:31 +00:00
George Bosilca	4747a4bb53	ompi_comm_all allocate memory and retain the objects. Therefore, after each call to ompi_comm_all we should parse the communicator list and release the objects ... This commit was SVN r20525.	2009-02-11 21:48:11 +00:00
Jeff Squyres	679e2855b7	Fix CID 1135: the assignment to item was never used (it was overwritten in the next loop iteration "item = next_item"). This commit was SVN r20189.	2009-01-03 15:15:42 +00:00
George Bosilca	82d1d5d785	The patch for "Unexpected message queue for unknown CID's required" ticket #1460 . I'm unable to split it in two parts, my patch and Edgar's one. So I just update copyright information for both of us. What this patch do: - it use the unexpected queue create by commit r19562 to dispatch the unexpected message to the right communicator (once this communicator is created and initialized). - delay the PML comm_add until we have the context_id for the new communicator. - only do the PML comm_add on processes that really belong to the new communicator. Please read the lengthy comment in the source code for the reason behind this. This commit was SVN r19929. The following SVN revision numbers were found above: r19562 --> open-mpi/ompi@acd3406aa7	2008-11-04 21:58:06 +00:00
George Bosilca	9260f6b157	There is no reason to ask for an ACK from the BTL here. This commit was SVN r19789.	2008-10-22 20:13:33 +00:00
Josh Hursey	88aa45dd52	Commit to bring online OpenIB, MX, and shared memory support for Open MPI's checkpoint/restart functionality. Some tuning is still needed, but basic functionality is in place. There is still a problem with OpenIB and threads (external to C/R functionality). It has been reported in Ticket #1539 Additionally: * Fix a file cleanup bug in CRS Base. * Fix a possible deadlock in the TCP ft_event function * Add a mca_base_param_deregister() function to MCA base * Add whole process checkpoint timers * Add support for BTL: OpenIB, MX, Shared Memory * Add support Mpool: rdma, sm * Sundry bounds checking an cleanup in some scattered functions This commit was SVN r19756.	2008-10-16 15:09:00 +00:00
George Bosilca	325d006577	Mostly cleanups, and eventually a little bit more scalable add_procs. There was an argument that was barely used, and on return at the PML level it contained nothing usable. It has been removed, so now we're using less memory ... This commit was SVN r19657.	2008-09-30 15:47:43 +00:00
George Bosilca	acd3406aa7	Never drop messages. No never no more. This is supposed to fix the ticket #1460. This commit was SVN r19562.	2008-09-15 23:04:18 +00:00
Rainer Keller	c1f2b8e476	- Fix resource leak in case of error. Coverity CID1067 This commit was SVN r19168.	2008-08-06 08:04:27 +00:00
Ralph Castain	9613b3176c	Effectively revert the orte_output system and return to direct use of opal_output at all levels. Retain the orte_show_help subsystem to allow aggregation of show_help messages at the HNP. After much work by Jeff and myself, and quite a lot of discussion, it has become clear that we simply cannot resolve the infinite loops caused by RML-involved subsystems calling orte_output. The original rationale for the change to orte_output has also been reduced by shifting the output of XML-formatted vs human readable messages to an alternative approach. I have globally replaced the orte_output/ORTE_OUTPUT calls in the code base, as well as the corresponding .h file name. I have test compiled and run this on the various environments within my reach, so hopefully this will prove minimally disruptive. This commit was SVN r18619.	2008-06-09 14:53:58 +00:00
George Bosilca	e361bcb64c	Send optimizations. 1. The send path get shorter. The BTL is allowed to return > 0 to specify that the descriptor was pushed to the networks, and that the memory attached to it is available again for the upper layer. The MCA_BTL_DES_SEND_ALWAYS_CALLBACK flag can be used by the PML to force the BTL to always trigger the callback. Unmodified BTL will continue to work as expected, as they will return OMPI_SUCCESS which force the PML to have exactly the same behavior as before. Some BTLs have been modified: self, sm, tcp, mx. 2. Add send immediate interface to BTL. The idea is to have a mechanism of allowing the BTL to take advantage of send optimizations such as the ability to deliver data "inline". Some network APIs such as Portals allow data to be sent using a "thin" event without packing data into a memory descriptor. This interface change allows the BTL to use such capabilities and allows for other optimizations in the future. All existing BTLs except for Portals and sm have this interface set to NULL. This commit was SVN r18551.	2008-05-30 03:58:39 +00:00
Galen Shipman	4da4c44210	Receive side changes, basically uses multiple active message callbacks rather than using a single receive callback followed by a switch on the header. Also fast pathed the matching for small fragments. This commit was SVN r18549.	2008-05-30 01:29:09 +00:00
Jeff Squyres	e7ecd56bd2	This commit represents a bunch of work on a Mercurial side branch. As such, the commit message back to the master SVN repository is fairly long. = ORTE Job-Level Output Messages = Add two new interfaces that should be used for all new code throughout the ORTE and OMPI layers (we already make the search-and-replace on the existing ORTE / OMPI layers): * orte_output(): (and corresponding friends ORTE_OUTPUT, orte_output_verbose, etc.) This function sends the output directly to the HNP for processing as part of a job-specific output channel. It supports all the same outputs as opal_output() (syslog, file, stdout, stderr), but for stdout/stderr, the output is sent to the HNP for processing and output. More on this below. * orte_show_help(): This function is a drop-in-replacement for opal_show_help(), with two differences in functionality: 1. the rendered text help message output is sent to the HNP for display (rather than outputting directly into the process' stderr stream) 1. the HNP detects duplicate help messages and does not display them (so that you don't see the same error message N times, once from each of your N MPI processes); instead, it counts "new" instances of the help message and displays a message every ~5 seconds when there are new ones ("I got X new copies of the help message...") opal_show_help and opal_output still exist, but they only output in the current process. The intent for the new orte_* functions is that they can apply job-level intelligence to the output. As such, we recommend that all new ORTE and OMPI code use the new orte_* functions, not thei opal_* functions. === New code === For ORTE and OMPI programmers, here's what you need to do differently in new code: * Do not include opal/util/show_help.h or opal/util/output.h. Instead, include orte/util/output.h (this one header file has declarations for both the orte_output() series of functions and orte_show_help()). * Effectively s/opal_output/orte_output/gi throughout your code. Note that orte_output_open() takes a slightly different argument list (as a way to pass data to the filtering stream -- see below), so you if explicitly call opal_output_open(), you'll need to slightly adapt to the new signature of orte_output_open(). * Literally s/opal_show_help/orte_show_help/. The function signature is identical. === Notes === * orte_output'ing to stream 0 will do similar to what opal_output'ing did, so leaving a hard-coded "0" as the first argument is safe. * For systems that do not use ORTE's RML or the HNP, the effect of orte_output_* and orte_show_help will be identical to their opal counterparts (the additional information passed to orte_output_open() will be lost!). Indeed, the orte_* functions simply become trivial wrappers to their opal_* counterparts. Note that we have not tested this; the code is simple but it is quite possible that we mucked something up. = Filter Framework = Messages sent view the new orte_* functions described above and messages output via the IOF on the HNP will now optionally be passed through a new "filter" framework before being output to stdout/stderr. The "filter" OPAL MCA framework is intended to allow preprocessing to messages before they are sent to their final destinations. The first component that was written in the filter framework was to create an XML stream, segregating all the messages into different XML tags, etc. This will allow 3rd party tools to read the stdout/stderr from the HNP and be able to know exactly what each text message is (e.g., a help message, another OMPI infrastructure message, stdout from the user process, stderr from the user process, etc.). Filtering is not active by default. Filter components must be specifically requested, such as: {{{ $ mpirun --mca filter xml ... }}} There can only be one filter component active. = New MCA Parameters = The new functionality described above introduces two new MCA parameters: * '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that help messages will be aggregated, as described above. If set to 0, all help messages will be displayed, even if they are duplicates (i.e., the original behavior). * '''orte_base_show_output_recursions''': An MCA parameter to help debug one of the known issues, described below. It is likely that this MCA parameter will disappear before v1.3 final. = Known Issues = * The XML filter component is not complete. The current output from this component is preliminary and not real XML. A bit more work needs to be done to configure.m4 search for an appropriate XML library/link it in/use it at run time. * There are possible recursion loops in the orte_output() and orte_show_help() functions -- e.g., if RML send calls orte_output() or orte_show_help(). We have some ideas how to fix these, but figured that it was ok to commit before feature freeze with known issues. The code currently contains sub-optimal workarounds so that this will not be a problem, but it would be good to actually solve the problem rather than have hackish workarounds before v1.3 final. This commit was SVN r18434.	2008-05-13 20:00:55 +00:00
Josh Hursey	da2f1c58e2	Some checkpoint/restart cleanup. * Remove the opal_only option. This was suffering from bit rot, and no one uses it. It can be added back fairly easily if wanted. * Cleanup metadata interactions at the local level. * Touch up some of the INC funcitonality (fix typos and a minor ordering issue) This commit was SVN r18416.	2008-05-08 18:47:47 +00:00
Josh Hursey	dcd21d7d07	Some checkpoint/restart fixes in response to r18338 (changes in modex). Things should be working now. This commit was SVN r18348. The following SVN revision numbers were found above: r18338 --> open-mpi/ompi@3e55fe6f6d	2008-05-01 17:48:13 +00:00
Ralph Castain	3e55fe6f6d	Fold in the revised modex scheme. Move the ompi_proc_t modex portions to the RTE level since the daemons already have that info. Provide each process with the equivalent of a "nidmap" - both a map of what nodes are in the job, and a map of which node each process is on. This enables the use of static ports, though that hasn't been turned "on" in this commit. Update the rsh tree spawn capability so we spawn the next wave of daemons before launching our own local procs. Add an ability to encode nodenames for large clusters with contiguous node name numbering schemes - this allows communication of all node names in a few bytes instead of tens-of-bytes/node. This commit was SVN r18338.	2008-04-30 19:49:53 +00:00
Josh Hursey	2c736873bb	Fix a checkpoint/restart bug that causes a restarted application to occasionally throw a SIGSEGV or SIGPIPE due to invalid socket descriptors. The problem was caused by a bad ordering between the restart of the ORTE level tcp connections (in the OOB - out-of-band communication) and the Open MPI level tcp connections (BTLs). Before this commit ORTE would shutdown and restart the OOB completely before the OMPI level restarted its tcp connections. What would happen is that a socket descriptor used by the OMPI level on checkpoint was assigned to the ORTE level on restart. But the OMPI level had no knowledge that the socket descriptor it was previously using has been recycled so it closed it on restart. This caused the ORTE level to break as the newly created socket descriptor was closed without its knowledge. The fix is to have the OMPI level shutdown tcp connections, allow the ORTE level to restart, and then allow the OMPi level to restart its connections. This seems obvious, and I'm surprised that this bug has not cropped up sooner. I'm confident that this specific problem has been fixed with this commit. Thanks to Eric Roman and Tamer El Sayed for their help in identifying this problem, and patience while I was fixing it. * Add a new state {{{OPAL_CRS_RESTART_PRE}}}. This state identifies when we are on the down slope of the INC (finalize-like) which is useful when you want to close, but not reopen a component set for fear of interfering with a lower level. * Use this new state in OMPI level coordination. Here we want to make sure to play well with both the OMPI/BTL/TCP and ORTE/OOB/TCP components. * Update ft_event functions in PML and BML to handle the new restart state. * Add an additional flag to the error output in OOB/TCP so we can see what the socket descriptor was on failure as this can be helpful in debugging. This commit was SVN r18276.	2008-04-24 17:54:22 +00:00
Josh Hursey	cc83d41ad9	Merge in tmp/jjh-scratch {{{ svn merge -r 18218:18240 https://svn.open-mpi.org/svn/ompi/tmp/jjh-scratch . }}} Contains: * Primarily a fix for a user reported problem where a cached file descriptor is causing a SIGPIPE on restart. * Cleanup some small memory leaks from using mca_base_param_env_var() - Thanks Jeff * Cleanup ORTE FT tool compilation in non-FT builds - Thanks Tim P. * Cleanup mpi interface with missplaced {{{OPAL_CR_ENTER_LIBRARY}}} - Thanks Terry * Some other sundry cleanup items all dealing with C/R functionality in the trunk. This commit was SVN r18241.	2008-04-23 00:17:12 +00:00
Ralph Castain	3a0d09300b	Fully implement the inbound binomial allgather for daemon-based collectives. Supports both modex and barrier operations. Comm_spawn still uses the rank=0 method - shifting that algo to the daemons is under study. This commit was SVN r18115.	2008-04-09 22:10:53 +00:00
Gleb Natapov	cf40674369	Decide if sends should be throttled at the receiver and pass this to the sender in an ACK message. The decision can't be done reliably at the sender. This commit was SVN r17987.	2008-03-27 08:56:43 +00:00
Ralph Castain	d70e2e8c2b	Merge the ORTE devel branch into the main trunk. Details of what this means will be circulated separately. Remains to be tested to ensure everything came over cleanly, so please continue to withhold commits a little longer This commit was SVN r17632.	2008-02-28 01:57:57 +00:00
George Bosilca	fa31ec81d0	Add the ownership flags to the PML/BTL interface. The layer owning the descriptor is responsible for releasing it once the descriptor is not in use anymore. This commit was SVN r17497.	2008-02-18 17:39:30 +00:00
Gleb Natapov	5cd38b8b06	Better encapsulate heterogeneous arch handling in ob1. This commit was SVN r16970.	2007-12-16 08:45:44 +00:00
Gleb Natapov	e2e211f23b	Add flags parameter to btl_alloc() and btl_prepare_src() functions. If BTL knows at the time of allocation priority of a descriptor it may do some optimizations. This commit was SVN r16901.	2007-12-09 14:08:01 +00:00
Gleb Natapov	2d784752dd	Remove descriptor caching form BML. With descriptor caching some optimizations are impossible. This commit was SVN r16897.	2007-12-09 13:58:17 +00:00
Rich Graham	27a748e7eb	change all instances of ompi_free_list_init to ompi_free_list_init_new. Header and payload data are specified separately at this stage. This commit was SVN r16633.	2007-11-01 23:38:50 +00:00
Ralph Castain	54b2cf747e	These changes were mostly captured in a prior RFC (except for #2 below) and are aimed specifically at improving startup performance and setting up the remaining modifications described in that RFC. The commit has been tested for C/R and Cray operations, and on Odin (SLURM, rsh) and RoadRunner (TM). I tried to update all environments, but obviously could not test them. I know that Windows needs some work, and have highlighted what is know to be needed in the odls process component. This represents a lot of work by Brian, Tim P, Josh, and myself, with much advice from Jeff and others. For posterity, I have appended a copy of the email describing the work that was done: As we have repeatedly noted, the modex operation in MPI_Init is the single greatest consumer of time during startup. To-date, we have executed that operation as an ORTE stage gate that held the process until a startup message containing all required modex (and OOB contact info - see #3 below) info could be sent to it. Each process would send its data to the HNP's registry, which assembled and sent the message when all processes had reported in. In addition, ORTE had taken responsibility for monitoring process status as it progressed through a series of "stage gates". The process reported its status at each gate, and ORTE would then send a "release" message once all procs had reported in. The incoming changes revamp these procedures in three ways: 1. eliminating the ORTE stage gate system and cleanly delineating responsibility between the OMPI and ORTE layers for MPI init/finalize. The modex stage gate (STG1) has been replaced by a collective operation in the modex itself that performs an allgather on the required modex info. The allgather is implemented using the orte_grpcomm framework since the BTL's are not active at that point. At the moment, the grpcomm framework only has a "basic" component analogous to OMPI's "basic" coll framework - I would recommend that the MPI team create additional, more advanced components to improve performance of this step. The other stage gates have been replaced by orte_grpcomm barrier functions. We tried to use MPI barriers instead (since the BTL's are active at that point), but - as we discussed on the telecon - these are not currently true barriers so the job would hang when we fell through while messages were still in process. Note that the grpcomm barrier doesn't actually resolve that problem, but Brian has pointed out that we are unlikely to ever see it violated. Again, you might want to spend a little time on an advanced barrier algorithm as the one in "basic" is very simplistic. Summarizing this change: ORTE no longer tracks process state nor has direct responsibility for synchronizing jobs. This is now done via collective operations within the MPI layer, albeit using ORTE collective communication services. I -strongly- urge the MPI team to implement advanced collective algorithms to improve the performance of this critical procedure. 2. reducing the volume of data exchanged during modex. Data in the modex consisted of the process name, the name of the node where that process is located (expressed as a string), plus a string representation of all contact info. The nodename was required in order for the modex to determine if the process was local or not - in addition, some people like to have it to print pretty error messages when a connection failed. The size of this data has been reduced in three ways: (a) reducing the size of the process name itself. The process name consisted of two 32-bit fields for the jobid and vpid. This is far larger than any current system, or system likely to exist in the near future, can support. Accordingly, the default size of these fields has been reduced to 16-bits, which means you can have 32k procs in each of 32k jobs. Since the daemons must have a vpid, and we require one daemon/node, this also restricts the default configuration to 32k nodes. To support any future "mega-clusters", a configuration option --enable-jumbo-apps has been added. This option increases the jobid and vpid field sizes to 32-bits. Someday, if necessary, someone can add yet another option to increase them to 64-bits, I suppose. (b) replacing the string nodename with an integer nodeid. Since we have one daemon/node, the nodeid corresponds to the local daemon's vpid. This replaces an often lengthy string with only 2 (or at most 4) bytes, a substantial reduction. (c) when the mca param requesting that nodenames be sent to support pretty error messages, a second mca param is now used to request FQDN - otherwise, the domain name is stripped (by default) from the message to save space. If someone wants to combine those into a single param somehow (perhaps with an argument?), they are welcome to do so - I didn't want to alter what people are already using. While these may seem like small savings, they actually amount to a significant impact when aggregated across the entire modex operation. Since every proc must receive the modex data regardless of the collective used to send it, just reducing the size of the process name removes nearly 400MBytes of communication from a 32k proc job (admittedly, much of this comm may occur in parallel). So it does add up pretty quickly. 3. routing RML messages to reduce connections. The default messaging system remains point-to-point - i.e., each proc opens a socket to every proc it communicates with and sends its messages directly. A new option uses the orteds as routers - i.e., each proc only opens a single socket to its local orted. All messages are sent from the proc to the orted, which forwards the message to the orted on the node where the intended recipient proc is located - that orted then forwards the message to its local proc (the recipient). This greatly reduces the connection storm we have encountered during startup. It also has the benefit of removing the sharing of every proc's OOB contact with every other proc. The orted routing tables are populated during launch since every orted gets a map of where every proc is being placed. Each proc, therefore, only needs to know the contact info for its local daemon, which is passed in via the environment when the proc is fork/exec'd by the daemon. This alone removes ~50 bytes/process of communication that was in the current STG1 startup message - so for our 32k proc job, this saves us roughly 32k50 = 1.6MBytes sent to 32k procs = 51GBytes of messaging. Note that you can use the new routing method by specifying -mca routed tree - if you so desire. This mode will become the default at some point in the future. There are a few minor additional changes in the commit that I'll just note in passing: propagation of command line mca params to the orteds - fixes ticket #1073. See note there for details. * requiring of "finalize" prior to "exit" for MPI procs - fixes ticket #1144. See note there for details. * cleanup of some stale header files This commit was SVN r16364.	2007-10-05 19:48:23 +00:00
Shiqing Fan	a0660f4deb	- Just some type casts. This commit was SVN r16100.	2007-09-12 15:29:58 +00:00
Gleb Natapov	5596aa5f53	The sizes of mca_pml_ob1_send_request_t and mca_pml_ob1_recv_request_t depend on a parameter and are determined in runtime. r15346 removed calculation of correct sizes for this structures. This patch adds it back and fixes trac:1116, #1114. This commit was SVN r15932. The following SVN revision numbers were found above: r15346 --> open-mpi/ompi@433f8a7694 The following Trac tickets were found above: Ticket 1116 --> https://svn.open-mpi.org/trac/ompi/ticket/1116	2007-08-20 12:06:27 +00:00
Mohamad Chaarawi	59a7bf8a9f	Merging in the Sparse Groups.. This commit includes config changes.. This commit was SVN r15764.	2007-08-04 00:41:26 +00:00
Brian Barrett	39a6057fc6	A number of improvements / changes to the RML/OOB layers: * General TCP cleanup for OPAL / ORTE * Simplifying the OOB by moving much of the logic into the RML * Allowing the OOB RML component to do routing of messages * Adding a component framework for handling routing tables * Moving the xcast functionality from the OOB base to its own framework Includes merge from tmp/bwb-oob-rml-merge revisions: r15506, r15507, r15508, r15510, r15511, r15512, r15513 This commit was SVN r15528. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r15506 r15507 r15508 r15510 r15511 r15512 r15513	2007-07-20 01:34:02 +00:00
George Bosilca	e19777e910	A more consistent version. As we now share the send and receive queue, we have to construct/destruct only once. Therefore, the construction will happens before digging for a PML, while the destruction just before finalizing the component. Add some OPAL_LIKELY/OPAL_UNLIKELY. This commit was SVN r15347.	2007-07-10 23:45:23 +00:00
Brian Barrett	8b9e8054fd	Move modex from pml base to general ompi runtime, sicne it's used by more than just the PML/BTLs these days. Also clean up the code so that it handles the situation where not all nodes register information for a given node (rather than just spinning until that node sends information, like we do today). Includes r15234 and r15265 from the /tmp/bwb-modex branch. This commit was SVN r15310. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r15234 r15265	2007-07-09 17:16:34 +00:00
George Bosilca	60319f99ac	Make sure in case of error what we return is clean (set to NULL). This commit was SVN r15251.	2007-07-01 16:17:43 +00:00
Gleb Natapov	54b40aef91	Schedule SEND traffic of pipeline protocol between BTLs in accordance with relative bandwidths of each BTL. Precalculate what part of a message should be send via each BTL in advance instead of doing it during scheduling. This commit was SVN r15248.	2007-07-01 11:34:23 +00:00
Josh Hursey	7fd1805e97	Fix a couple of compile warnings that Tim P brought to by attention. This commit was SVN r15132.	2007-06-19 00:46:16 +00:00
Gleb Natapov	10266fb467	Fix deadlock in OB1 protocol by by sending memory by copying if registration fails. This commit was SVN r14842.	2007-06-03 08:31:58 +00:00
Gleb Natapov	a25e1e7b15	Implement new function mca_pml_ob1_send_requst_copy_in_out(req, offset, len) that allows to send any range of a request by send/recv instaed of RDMA and use it to send data from the end of a request in pipeline protocol. This commit was SVN r14841.	2007-06-03 08:30:07 +00:00
Gleb Natapov	06bf5d74e7	Remove mca_pml_ob1_send_fin_btl function. This commit was SVN r14784.	2007-05-28 06:51:12 +00:00
Gleb Natapov	ad69d3c6ac	Fix out of resource handling for FIN packets broken by r14768. This commit was SVN r14780. The following SVN revision numbers were found above: r14768 --> open-mpi/ompi@3401bd2b07	2007-05-27 08:29:38 +00:00
Galen Shipman	3401bd2b07	Add optional ordering to the BTL interface. This is required to tighten up the BTL semantics. Ordering is not guaranteed, but, if the BTL returns a order tag in a descriptor (other than MCA_BTL_NO_ORDER) then we may request another descriptor that will obey ordering w.r.t. to the other descriptor. This will allow sane behavior for RDMA networks, where local completion of an RDMA operation on the active side does not imply remote completion on the passive side. If we send a FIN message after local completion and the FIN is not ordered w.r.t. the RDMA operation then badness may occur as the passive side may now try to deregister the memory and the RDMA operation may still be pending on the passive side. Note that this has no impact on networks that don't suffer from this limitation as the ORDER tag can simply always be specified as MCA_BTL_NO_ORDER. This commit was SVN r14768.	2007-05-24 19:51:26 +00:00
Ralph Castain	4fff584a68	Commit the orted-failed-to-start code. This correctly causes the system to detect the failure of an orted to start and allows the system to terminate all procs/orteds that did start. The primary change that underlies all this is in the OOB. Specifically, the problem in the code until now has been that the OOB attempts to resolve an address when we call the "send" to an unknown recipient. The OOB would then wait forever if that recipient never actually started (and hence, never reported back its OOB contact info). In the case of an orted that failed to start, we would correctly detect that the orted hadn't started, but then we would attempt to order all orteds (including the one that failed to start) to die. This would cause the OOB to "hang" the system. Unfortunately, revising how the OOB resolves addresses introduced a number of additional problems. Specifically, and most troublesome, was the fact that comm_spawn involved the immediate transmission of the rendezvous point from parent-to-child after the child was spawned. The current code used the OOB address resolution as a "barrier" - basically, the parent would attempt to send the info to the child, and then "hold" there until the child's contact info had arrived (meaning the child had started) and the send could be completed. Note that this also caused comm_spawn to "hang" the entire system if the child never started... The app-failed-to-start helped improve that behavior - this code provides additional relief. With this change, the OOB will return an ADDRESSEE_UNKNOWN error if you attempt to send to a recipient whose contact info isn't already in the OOB's hash tables. To resolve comm_spawn issues, we also now force the cross-sharing of connection info between parent and child jobs during spawn. Finally, to aid in setting triggers to the right values, we introduce the "arith" API for the GPR. This function allows you to atomically change the value in a registry location (either divide, multiply, add, or subtract) by the provided operand. It is equivalent to first fetching the value using a "get", then modifying it, and then putting the result back into the registry via a "put". This commit was SVN r14711.	2007-05-21 18:31:28 +00:00
Ralph Castain	18b2dca51c	Bring in the code for routing xcast stage gate messages via the local orteds. This code is inactive unless you specifically request it via an mca param oob_xcast_mode (can be set to "linear" or "direct"). Direct mode is the old standard method where we send messages directly to each MPI process. Linear mode sends the xcast message via the orteds, with the HNP sending the message to each orted directly. There is a binomial algorithm in the code (i.e., the HNP would send to a subset of the orteds, which then relay it on according to the typical log-2 algo), but that has a bug in it so the code won't let you select it even if you tried (and the mca param doesn't show, so you'd really have to try). This also involved a slight change to the oob.xcast API, so propagated that as required. Note: this has only been tested on rsh, SLURM, and Bproc environments (now that it has been transferred to the OMPI trunk, I'll need to re-test it [only done rsh so far]). It should work fine on any environment that uses the ORTE daemons - anywhere else, you are on your own... :-) Also, correct a mistake where the orte_debug_flag was declared an int, but the mca param was set as a bool. Move the storage for that flag to the orte/runtime/params.c and orte/runtime/params.h files appropriately. This commit was SVN r14475.	2007-04-23 18:41:04 +00:00
Josh Hursey	12e5d0e817	ft_event Commit: - Move the PML Modex stuff out of the BML -- Abstraction violation. - Also fix the location of the add_procs with respect to the stage gates. This commit was SVN r14422.	2007-04-19 03:05:12 +00:00
Josh Hursey	8f119d9063	Closes trac:977 Fix for memory corruption in the restarted process stack. This stemed from the brute force method we were previously using. This commit fixes this by using a lighter weight solution focused in the r2 BML instead of above the PML. This is a more efficient and flexible solution, and it solves the original problem. In the process I pulled out the ft_event function in the tcp BTL and r2 BML into a set of *_ft.[c\|h] files just to keep any updates to these code paths as isolated as possible to make merging easier on everyone. This commit was SVN r14371. The following SVN revision numbers were found above: r2 --> open-mpi/ompi@58fdc18855 The following Trac tickets were found above: Ticket 977 --> https://svn.open-mpi.org/trac/ompi/ticket/977	2007-04-14 02:06:05 +00:00
Josh Hursey	38547459ae	Improve the cleanup process in ob1 Remove a redundant statement in the r2 BML. This commit was SVN r14228. The following SVN revision numbers were found above: r2 --> open-mpi/ompi@58fdc18855	2007-04-05 17:37:29 +00:00
Josh Hursey	98fb9f26ef	Some cleanup. - Remove an old comment from crcp_base_fns.c - Let ob1 have its very own ft_event function (which I'll fill in shortly) - Make sure ob1 finalizes the bsend stuff so we don't leave a bunch of memory sitting around - PML base - destruct the array upon finalize. Shrink the include search so it stops after finding a match This commit was SVN r14222.	2007-04-05 13:52:05 +00:00
Galen Shipman	ace68b1883	Change the way we handle unexpected messages, if less than or equal pml_ob1_unexpected_limit just buffer in the PML level recv fragment else allocate a buffer via the bucket allocator This commit was SVN r14117.	2007-03-22 01:00:34 +00:00
Josh Hursey	dadca7da88	Merging in the jjhursey-ft-cr-stable branch (r13912 : HEAD). This merge adds Checkpoint/Restart support to Open MPI. The initial frameworks and components support a LAM/MPI-like implementation. This commit follows the risk assessment presented to the Open MPI core development group on Feb. 22, 2007. This commit closes trac:158 More details to follow. This commit was SVN r14051. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r13912 The following Trac tickets were found above: Ticket 158 --> https://svn.open-mpi.org/trac/ompi/ticket/158	2007-03-16 23:11:45 +00:00
Brian Barrett	041beeb1b6	Share currently selected PML in the modex information, then check whenever adding new procs that the remote proc's pml is the same as our local pml. Turns the hangs from mismatched PMLs into an abort, which is better, I think. This commit was SVN r13582.	2007-02-09 16:38:16 +00:00
George Bosilca	0ff2115964	Other warnings are now silenced. This commit was SVN r13462.	2007-02-02 06:47:35 +00:00
Brian Barrett	48ec0b2071	Revert out r12974, 12976, and 12991 as George has provided a less intrusive fix for now... This commit was SVN r12997. The following SVN revision numbers were found above: r12974 --> open-mpi/ompi@27cea44a9c	2007-01-04 22:07:37 +00:00
Brian Barrett	27cea44a9c	Fix a number of issues with the ompi_ptr_t: * Make sure that the pval always writes to the correct portion of the lval. This only matters on 32 bit big endian machines. * On 32 bit machines when assigning to pval, the other 4 bytes of lval weren't being written, which could lead to bogus data We use macros so that there aren't casts all over the code and the pval assignment can occur to the correct 4 bytes. Refs trac:587 This commit was SVN r12974. The following Trac tickets were found above: Ticket 587 --> https://svn.open-mpi.org/trac/ompi/ticket/587	2007-01-03 19:47:48 +00:00
Brian Barrett	441432950f	Merge in changes from the bwb-heterogeneous temp branch (r12491 - r12714) for supporting compilers / architectures with different padding rules. This commit was SVN r12749. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r12491 r12714	2006-12-04 20:11:42 +00:00
Gleb Natapov	30ca7457b4	Some BTLs (e.g TCP) can report put/get completion before data actually hits the buffer on the other side. For this kind of BTLs we need to send FIN through the same BTL, PUT was performed with so network will handle ordering for us. If we will use another BTL, receiver can get FIN before data will hit the buffer and complete request prematurely. We mark such problematic BTLs with MCA_BTL_FLAGS_FAKE_RDMA flag (this kind of RDMA is really fake, because the real one guaranties that sender will see the completion only after receiver's NIC confirmed that all the data was received). This commit was SVN r12732.	2006-12-03 10:12:09 +00:00
Gleb Natapov	90be664b9f	Some process_pending() functions get bml_btl on which resource was freed as a parameter. For optimisation purpose only this BTL is used to send packet through instead of trying to send packets through all BTLs. But actually the code was wrong. It simply used provided bml_btl and it may represent different endpoint from packet's destination. The fixed code checks if packet's destination is reachable through the BTL, finds appropriate bml_btl and only then tries to send it through correct bml_btl. This commit was SVN r12319.	2006-10-26 13:21:47 +00:00
George Bosilca	06563b5dec	Last set of explicit conversions. We are now close to the zero warnings on all platforms. The only exceptions (and I will not deal with them anytime soon) are on Windows: - the write functions which require the length to be an int when it's a size_t on all UNIX variants. - all iovec manipulation functions where the iov_len is again an int when it's a size_t on most of the UNIXes. As these only happens on Windows, so I think we're set for now :) This commit was SVN r12215.	2006-10-20 03:57:44 +00:00
George Bosilca	688a16ea78	A long time waiting patch. Get rid of the comm->c_pml_procs. It was (and that was long ago) supposed to be used as a cache for accessing the PML procs. But in all of the PMLs the PML proc contain only one field i.e. a pointer to the ompi_proc. This pointer can be accessed using the c_remote_group easily. Therefore, there is no meaning of keeping the PML procs around. Slim fast commit ... This commit was SVN r11730.	2006-09-20 22:14:46 +00:00
Galen Shipman	e809a442e7	add the error handler registration to OB1.. This commit was SVN r11234.	2006-08-16 20:56:22 +00:00
Gleb Natapov	91f48f9a79	Merge with gleb-pml branch. Add out of resource handling support to PML layer. If resource is not available request is added to one of the pending list and retried later. This commit was SVN r10900.	2006-07-20 14:44:35 +00:00
Brian Barrett	47725c9b02	* Add new PML (CM) and network drivers (MTL) for high speed interconnects that provide matching logic in the library. Currently includes support for MX and some support for Portals * Fix overuse of proc_pml pointer on the ompi_proc structuer, splitting into proc_pml for pml data and proc_bml for the BML endpoint data * bug fixes in bsend init code, which wasn't being used by the OB1 or DR PMLs... This commit was SVN r10642.	2006-07-04 01:20:20 +00:00
Tim Woodall	12e502b10d	use correct loop index This commit was SVN r9356.	2006-03-21 18:18:22 +00:00
Galen Shipman	5600932c2f	fix misc warnings This commit was SVN r9339.	2006-03-20 15:41:45 +00:00
Tim Woodall	bd870519fd	- modified convertor copy_and_prepare routines to accept an addition flag, new flags to be included when convertor is initialized - modified pml/btl module defs and added stub functions for diagnostic output routines to dump state of queues / endpoints - updates to data reliability pml This commit was SVN r9329.	2006-03-17 18:46:48 +00:00
Rainer Keller	5102571c02	- Get rid of the temporary reachability bitmap. This commit was SVN r9163.	2006-02-27 11:06:01 +00:00
Brian Barrett	566a050c23	Next step in the project split, mainly source code re-arranging - move files out of toplevel include/ and etc/, moving it into the sub-projects - rather than including config headers with <project>/include, have them as <project> - require all headers to be included with a project prefix, with the exception of the config headers ({opal,orte,ompi}_config.h mpi.h, and mpif.h) This commit was SVN r8985.	2006-02-12 01:33:29 +00:00
George Bosilca	0376dce258	Keep track of the ompi_proc in the comm_proc. This avoid a lookup for the processor and simplify the execution path. The peer proc (ompi_proc_t) is set at the matching stage. This commit was SVN r8962.	2006-02-10 18:55:43 +00:00
Jeff Squyres	42ec26e640	Update the copyright notices for IU and UTK. This commit was SVN r7999.	2005-11-05 19:57:48 +00:00
Tim Woodall	4fc5b2105a	this is currently an int - we shouldn't restrict it unless required This commit was SVN r7895.	2005-10-27 17:06:58 +00:00
Edgar Gabriel	5d7fbd9d2e	minor change in bml_r2_add_procs: the memory for the bml_endpoints structure has to be allocated outside of the routine. Thus, the update version of pml/ob1/oml_ob1.c This commit was SVN r7739.	2005-10-12 20:59:25 +00:00
Brian Barrett	b7ef094766	* the cid in the header is only 16 bits, so limit our max cid to what can fit in there. This commit was SVN r7639.	2005-10-05 15:43:28 +00:00
Tim Woodall	a74ca0062a	reductions to initial memory footprint This commit was SVN r7455.	2005-09-21 19:10:56 +00:00
George Bosilca	2e3764a181	Remove the unused variable p + something that SVN figure out and where I dont see any difference, certainly just some conversion from TAB to space ... This commit was SVN r7287.	2005-09-10 04:06:49 +00:00
George Bosilca	0ad973afdd	Do not modex sned and receive the proc architecture. This is now done outside the PML in the proc init and was added there few weeks ago by Ralph. This commit was SVN r7282.	2005-09-09 22:21:57 +00:00
Tim Woodall	2a9ab3eb10	move obj construct back into open This commit was SVN r7057.	2005-08-26 20:28:42 +00:00
Galen Shipman	b01ebf45c9	Fixed build error related to direct call (bml_direct_call.h). Misc bug fixes and compiler warning issues. Fixed threaded build issue. This commit was SVN r6819.	2005-08-12 14:08:40 +00:00
Galen Shipman	c3c83aa3e1	BML (BTL Managment Layer). Allows BTL's to be used outside of the PML. See bml.h and PML-OB1 for usage. This commit was SVN r6815.	2005-08-12 02:41:14 +00:00
Brian Barrett	95fd068ffa	remove hard coded constants for value of MPI_TAG_UB and the max CID and add the values to the PML structure. This will allow PMLs that want to do hardware matching at the cost of a smaller range of valid tags and cids. Updated all the places that used the MPI_TAG_UB_VALUE constant to instead look at the pml struct. This commit was SVN r6778.	2005-08-09 14:56:04 +00:00
George Bosilca	8b93cb7661	Rename all the functions starting with mca_base_modex to mca_pml_base_modex. Change all the places where they are used to fit the new name. Remove the code to check the remote arch from the PML. We will have a GPR mechanism in ompi_mpi_initialize to do that. This commit was SVN r6750.	2005-08-05 18:03:30 +00:00
Tim Woodall	2324f36065	call ob1 progress rather than endpoint - as it may not have one This commit was SVN r6696.	2005-08-01 22:30:25 +00:00

1 2 3 4

159 Коммитов