hopefully, this now compiles for libnuma 0.9.x and libnuma 2.0.x.
Fixes for the strategy discussed in the commit message for r24442
(i.e., check against numa_get_mems_allowed(), which only exists in
libnuma 2.0.x) and the new new new plan on #2698 coming in a separate
commit.
This commit was SVN r24443.
The following SVN revision numbers were found above:
r24442 --> open-mpi/ompi@90a8fe4aad
(with libnuma-2.0.4 / LIBNUMA_API_VERSION 2): numa_get_run_node_mask
returns a struct bitmask *.
Whether it's a good idea to blindly pass that on to
numa_set_membind() is another matter: one might want to match against
the list returned by numa_get_mems_allowed(), which may be set by the
outside environment.
Refs trac:2698.
This commit was SVN r24442.
The following SVN revision numbers were found above:
r24421 --> open-mpi/ompi@31510e683b
The following Trac tickets were found above:
Ticket 2698 --> https://svn.open-mpi.org/trac/ompi/ticket/2698
Update the CMake script for checking mca subdirs.
Add windows support for __attribute__ packed structures.
Define usleep and posix_memalign with equivalent windows functions.
And a few minor fixes, type casts.
This commit was SVN r24429.
what memory node the process is running on (which is guaranteed to be
a good answer because maffinity won't be invoked unless the process is
already bound to a specific processor), and then bind our memory to
that.
Refs trac:2698.
This commit was SVN r24421.
The following SVN revision numbers were found above:
r24290 --> open-mpi/ompi@afa654746c
The following Trac tickets were found above:
Ticket 2698 --> https://svn.open-mpi.org/trac/ompi/ticket/2698
OMPI supports multiple different repository systems (SVN, hg, git).
But the VERSION file has listed "want_svn" and "svn_r" as fields, even
though the actual repo system and version may not be SVN.
So search/replace those fields (and derrivative values that come from
those fields) with "want_repo_rev" and "repo_rev", respectively.
This commit was SVN r24405.
Add some new proc/job states
Rename a constant to reflect coming change - remove the arbitrary difference between restarting a proc locally and relocating it to another node in terms of the number of restarts allowed.
Add pretty-print of signals for "proc aborted due to signal" reports.
This commit was SVN r24378.
The following SVN revision numbers were found above:
r24371 --> open-mpi/ompi@93d28a5792
This means that the converters (opal_err2str, orte_err2str) can now
return NULL as a "silent error". The return value of opal_err2str_fn_t
is the status of the operation (OPAL_SUCCESS or OPAL_ERROR).
This fixes the "Unknown error" message issues on the trunk.
This commit was SVN r24371.
Temporarily remove hwloc's internal version of myriexpress.h. It is
causing a problem when compiling Open MPI with MX support because
hwloc uses AC_CONFIG_HEADER in hwloc's hwloc.m4 to generate
opal/mca/paffinity/hwloc/hwloc/include/hwloc/config.h.
AC_CONFIG_HEADER apparently has the (undocumented) side effect of
adding -I$(top_builddir)/opal/mca/paffinity/hwloc/hwloc/include/hwloc
to OMPI's compilation flags. Hence, when the OMPI MX components are
compiled and #include "myriexpress.h" (or <myriexpress.h>) they see
hwloc's myriexpress.h before the system one. Badness ensures.
This removal is temporary because we need to figure out a better
solution. But for now, OMPI is not using hwloc's myriexpress.h file --
so it's safe to remove. I'll push this issue upstream to hwloc to
figure out a better solution...
This commit was SVN r24354.
The following Trac tickets were found above:
Ticket 2690 --> https://svn.open-mpi.org/trac/ompi/ticket/2690
Short Version:
--------------
Event engine needs to be flushed so it does not use old/stale file descriptors.
Long Version:
-------------
The problem was that the restarted process was waiting for the socket to the local daemon to finish establishing during the 'sync' operation. The core problem was that the daemon was sending a header of 36 bytes, but the restarted process only received 35 bytes of the message. So the restarted process became stuck waiting for the last byte to arrive.
After many hours of digging, I figured out that the event engine was using the same file descriptor for its evsig_cb functionality (to signal itself when a signal arrives). So when the daemon wrote in to the new fd the event engine was stealing the first byte (*shakes fist at event engine*) before the recv() could be posted.
The solution is to use the event_reinit() function on restart to re-establish the now-stale file descriptors in the event engine. This seems to have fixed the problem.
A few other minor things:
-------------------------
* Add a check to make sure the event engine is balanced in its init/finalize
* Add the opal_event_base_close() to the BLCR restart exec function (still not 100% sure it is needed, but there it is).
This commit was SVN r24296.
last December. :-(
Add new MCA param: maffinity_libnuma_policy. Thanks to David
Singleton for the suggestion. Here's the help text about it:
{{{
MCA maffinity: parameter "maffinity_libnuma_policy" (current value:
<loose>, data source: default value)
Binding policy that determines what happens if memory
is unavailable on the local NUMA node. A value of
"strict" means that the memory allocation will fail;
a value of "loose" means that the memory allocation
will spill over to another NUMA node.
}}}
This commit was SVN r24290.
either direct link to these basic predefined types, or a combination of them.
Anyway, the first items in the datatype list belong to OPAL, the second round
are MPI datatypes created by composing basic OPAL datatypes, and the last
batch are mapped datatype (direct correspondance between an OMPI datatype and
an OPAL one such as int -> int32_t).
Modify the op to fit this new scheme.
This commit was SVN r24247.
the module to use the new hwloc bitmap API (the cpuset API is both
klunkier and deprecated), which simplified a few things.
This commit was SVN r24217.
{{{
base/paffinity_base_service.c: In function ‘opal_paffinity_base_cset2mapstr’:
base/paffinity_base_service.c:623: warning: unused variable ‘range_last’
base/paffinity_base_service.c:623: warning: unused variable ‘range_first’
base/paffinity_base_service.c:622: warning: unused variable ‘count’
base/paffinity_base_service.c:622: warning: unused variable ‘m’
}}}
{{{
connect/btl_openib_connect_oob.c: In function ‘init_ud_qp’:
connect/btl_openib_connect_oob.c:1111: warning: control reaches end of non-void function
connect/btl_openib_connect_oob.c: In function ‘init_device’:
connect/btl_openib_connect_oob.c:1235: warning: unused variable ‘i’
connect/btl_openib_connect_oob.c: In function ‘get_pathrecord_sl’:
connect/btl_openib_connect_oob.c:1323: warning: unused variable ‘i’
}}}
This commit was SVN r24196.
It is statically initialized to the real back-end OPAL show_help
function. During orte_show_help_init(), the variable is re-assigned
with the value of the back-end ORTE show_help function (the one that
does error message aggregation).
Therefore, anything that calls opal_show_help() after a certain point
in orte_init() will have their show_help messages be aggregated.
w00t! Even code down in OPAL -- that has no knowledge of ORTE -- will
have their messages aggregated. '''Double w00t!'''
During orte_show_help_finalize(), we restore the original pointer
value so that it something calls opal_show_help() after
orte_finalize(), it'll still work properly (but it won't be
aggregated).
This commit was SVN r24185.
1. Remove it from libevent207.h because it is not needed.
2. Add compat to the include list so it can use queue.h when needed.
This commit was SVN r24144.
Change OMPI code in libevent to not use bool.
Add some comments to indicate OMPI specific code.
This should fix compiles on Sun Studio Solaris.
This commit was SVN r24062.
As memmove is slower than memcpy, I added the required logic to only use it when
really necessary.
No modification from developers point of view, you should always call
opal_datatype_copy_content_same_ddt.
This commit was SVN r24047.
It's not possible to combine two shared libraries on Windows, so we have to do it a bit different. First generate a small event static library by just linking the object files, and link it into other libraries that needs the libevent API.
This commit was SVN r24039.
handles more modern versions of Autoconf's --program-transform
arguments. Also make it clear that the message is coming from Open
MPI logic, so that we don't blame Autoconf, Red Hat, or anyone else
next time!
This commit was SVN r24024.
The following Trac tickets were found above:
Ticket 2611 --> https://svn.open-mpi.org/trac/ompi/ticket/2611
1. create DLL A, export symbols of A, import nothing (A normally is OPAL)
should define _USRDLL , A_EXPORT
2. create DLL B, export symbols of B, import A.lib (B could be ORTE, OMPI or other ompi tools)
should define _USRDLL, B_EXPORT
3. create DLL C, import B.dll (C could be external libs or apps)
should define B_IMPORT
This commit was SVN r24016.
libevent creates its event-config.h during "make all" (vs. during
configure). The prior method around this didn't work because it wrote
an event-config.h.in in the source tree -- a Bad Idea(tm). The new
way uses AC_CONFIG_COMMAND to get stuff executed at the end of
config.status to create event-config.h. This seems to work properly
during make distcheck.
This commit was SVN r23975.
Note: the ompi_check_libfca.m4 file had to be modified to avoid it stomping on global CPPFLAGS and the like. The file was also relocated to the ompi/config directory as it pertains solely to an ompi-layer component.
Forgive the mid-day configure change, but I know Shiqing is working the windows issues and don't want to cause him unnecessary redo work.
This commit was SVN r23966.
After talking with Brian, we're pretty sure that this is only because
really, really old libevent didn't allow bitwise or-ing of the other
loop types, because what we really need is (EVLOOP_ONCE |
EVLOOP_NONBLOCK). And that's what EVLOOP_ONELOOP did (i.e., we
changed the logic of libevent's event.c to let ONELOOP do both ONCE
and NONBLOCK things).
In the new libevent version, we didn't implement EVLOOP_ONELOOP
properly. As a result, and we got hangs in the SM BTL add_procs
function. Note that the SM BTL wasn't to blame -- it was purely a
side-effect of bad ONELOOP integration (i.e., if you got past the SM
BTL add_procs, you may well have hung somewhere else).
This commit removes all ONELOOP customizations from event.c and
returns it to (almost) its original state from the libevent 2.0.7-rc
distribution. Everwhere in the code base where we used ONELOOP, we
now use (ONCE | NONBLOCK).
This commit was SVN r23957.
Setup the event API to support multiple bases in preparation for splitting the OMPI and ORTE events. Holding here pending shared memory resolution.
This commit was SVN r23943.