1
1
Граф коммитов

521 Коммитов

Автор SHA1 Сообщение Дата
Andrey Maslennikov
63ba7bec46 platform/mellanox: disable missing libcuda warning
Signed-off-by: Andrey Maslennikov <andreyma@mellanox.com>
2019-09-22 16:02:57 +03:00
Mikhail Brinskii
404c480068 COLL/TUNED: Update alltoall selection rule for mlx
Use linear with sync alltoall algorithm for certain message/comm size
ranges. Does not affect default fixed decision, unless HPCX (with its
custom parameters) is used or corresponding mca is set.

Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
2019-07-13 23:27:40 +03:00
Jeff Squyres
99553eb1b9 platform: Remove "with_verbs" from all the platform files.
Since --with-verbs has been removed, then remove it from all the
platform files, too.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-02-07 05:36:06 -08:00
Jeff Squyres
16de1a990e contrib/platform: remove stale redstorm file
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-01-10 13:45:06 -08:00
Jeff Squyres
f86da9beee platform/contrib: remove stale "iu" directory
IU is no longer active in the Open MPI project.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-01-10 13:45:06 -08:00
Andrey Maslennikov
074e9cc92c platform/mellanox: disable btl-uct by default
Signed-off-by: Andrey Maslennikov <andreyma@mellanox.com>
2018-10-22 12:23:40 +03:00
Yossi Itigin
6c9a95df3e
Merge pull request #5858 from amaslenn/mlnx-no-verbs
platform/mellanox: disable openib/verbs
2018-10-08 14:08:09 +03:00
Andrey Maslennikov
7180ab144a platform/mellanox: disable openib/verbs
Signed-off-by: Andrey Maslennikov <andreyma@mellanox.com>
2018-10-08 12:13:44 +03:00
Ralph Castain
952090854a Add intel/bend platform file
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-10-06 20:26:26 -07:00
Ralph Castain
1624f8090b Update intel/bend platform files
[skip ci]
bot:notest

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-10-05 15:06:20 -07:00
Yossi Itigin
a31dc5ddcb
Merge pull request #5725 from amaslenn/platform-mellanox
platform/mellanox: cleanup autodetect config
2018-09-20 18:48:43 +03:00
Yossi Itigin
b18af26f4b
Merge pull request #5726 from amaslenn/platform-mellanox-conf
platform/mellanox: update default configuration
2018-09-20 18:48:18 +03:00
Howard Pritchard
b9ac3d8931 SCIF: remove it
KNC is effectively dead.  Remove corresponding SCIF
support in Open MPI.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2018-09-19 10:39:52 -06:00
Andrey Maslennikov
da18a2d24c platform/mellanox: update default configuration
Signed-off-by: Andrey Maslennikov <andreyma@mellanox.com>
2018-09-18 09:51:47 +03:00
Andrey Maslennikov
ced50a98ff platform/mellanox: cleanup autodetect config
Signed-off-by: Andrey Maslennikov <andreyma@mellanox.com>
2018-09-18 09:47:51 +03:00
Ralph Castain
98b4ed9a3a Fix the no-disconnect test
A race condition exists based on whether or not the userdata object attached to a hwloc_obj_t has been initialized. These objects are setup whenever we scan for resources under that location. You therefore must not set a variable to the pointer to the userdata object and then call a function that will initialize the data in it - you need to set the variable after the function call, and protect against a NULL pointer

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-06-19 13:52:34 -07:00
Ralph Castain
081a0d98eb Ignore the ud/oob component
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-06-18 13:25:16 -07:00
Ralph Castain
014bb3c8de Fix external hwloc builds
Remove spurious comma in header file definition. Remove unused variables

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-06-03 11:24:21 -07:00
Nathan Hjelm
85d1965a0f
Merge pull request #4828 from hppritcha/topic/update_lanl_toss_platform
lanl/platform: add new toss2/3 platform files
2018-05-01 09:52:14 -06:00
Ralph Castain
538fd18fad Update default MCA params in platform file
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-03-25 17:14:01 -07:00
Howard Pritchard
8eb738a9c8 lanl/platform: add new toss2/3 platform files
remove old platform files
add new platform files for toss2/toss3
OPA/MLX-IB variants.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2018-02-17 10:32:07 -07:00
Howard Pritchard
406c4cc126 Merge pull request #4299 from hppritcha/topic/update_lanl_toss_platform_file
LANL/platform: disable use of XRC recv bufs
2017-10-06 09:31:17 -06:00
Howard Pritchard
1a639ec477 LANL/platform: disable use of XRC recv bufs
Forgot as part of #3970 to disable use of XRC
recv bufs by default in LANL platform config
file.

related to #4300

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-10-04 11:40:20 -06:00
Ralph Castain
4f932819aa Update platform file
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-09-22 16:05:57 -07:00
Ralph Castain
f7e8780a42 Remove fortran support from platform file
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-07-20 21:02:30 -07:00
Artem Polyakov
35f15a0ba5 contrib: Fix mellanox platform defaults (btl/sm -> btl/vader)
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2017-07-15 14:32:26 +07:00
Ralph Castain
243076dd8c Update gadget platform file
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-04-19 21:45:13 -06:00
Howard Pritchard
9350aa5d71 orte/ras: remove loadleveler support
Remove loadleveler as it is obsolescent and is no longer supported.

Fixes #3167

We'll wait for final check of whether or not loadleveler even
compiles/functions before merging this.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2017-03-21 10:32:28 -06:00
Ralph Castain
24e8639826 Platform file update
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-14 11:11:48 -06:00
Ralph Castain
6d6bc9bd07 Update alps module to new APIs
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-12 09:43:07 -07:00
Ralph Castain
48fc339718 Create an alternative mapping method that pushes responsibility
onto the backend daemons. By default, let mpirun only pack the app_context
info and send that to the backend daemons where the mapping will
be done. This significantly reduces the computational time on mpirun as it isn't
running up/down the topology tree computing thousands of binding
locations, and it reduces the launch message to a very small number of
bytes.

When running -novm, fall back to the old way of doing things
where mpirun computes the entire map and binding, and then sends
the full info to the backend daemon.

Add a new cmd line option/mca param --fwd-mpirun-port that allows
mpirun to dynamically select a port, but then passes that back to
all the other daemons so they will use that port as a static port
for their own wireup. In this mode, we no longer "phone home" directly
to mpirun, but instead use the static port to wireup at daemon
start. We then use the routing tree to rollup the initial
launch report, and limit the number of open sockets on mpirun's node.

Update ras simulator to track the new nidmap code

Cleanup some bugs in the nidmap regex code, and enhance the error message for not enough slots to include the host on which the problem is found.

Update gadget platform file

Initialize the range count when starting a new range

Fix the no-np case in managed allocation

Ensure DVM node usage gets cleaned up after each job

Update scaling.pl script to use --fwd-mpirun-port. Pre-connect the daemon to its parent during launch while we are otherwise waiting for the daemon's children to send their "phone home" rollup messages

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-03-07 20:43:12 -08:00
Ralph Castain
9f8f7f3189 Add CPPFLAGS to build of rml/ofi component.
Fix finalize to ensure we only destruct the msg queue list once.
Update platform file

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-02-25 09:17:41 -08:00
Ralph Castain
28abe78f8c Add new platform files. Modify scaling.pl to support ppn option
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-29 15:55:49 -08:00
Ralph Castain
649301a3a2 Revise the routed framework to be multi-select so it can support the new conduit system. Update all calls to rml.send* to the new syntax. Define an orte_mgmt_conduit for admin and IOF messages, and an orte_coll_conduit for all collective operations (e.g., xcast, modex, and barrier).
Still not completely done as we need a better way of tracking the routed module being used down in the OOB - e.g., when a peer drops connection, we want to remove that route from all conduits that (a) use the OOB and (b) are routed, but we don't want to remove it from an OFI conduit.
2016-10-23 21:52:39 -07:00
Ralph Castain
2f966bf3bf Cleanup external PMIx v3 component for copy/paste errors - component and module require unique names 2016-10-20 09:11:46 -07:00
Alina Sklarevich
a2be17ec14 Revert "mellanox/optimized: set enable_openib_rdmacm_ibaddr=yes in the mellanox/optimized file."
This reverts commit 6cd7282631.
2016-06-06 11:26:11 +03:00
Nathan Hjelm
1e6b4f2f55 Merge pull request #1495 from hjelmn/new_hooks
Add new patcher memory hooks
2016-04-13 18:19:23 -06:00
Nathan Hjelm
27f8a4e806 opal: add code patcher framework
This commit adds a framework to abstract runtime code patching.
Components in the new framework can provide functions for either
patching a named function or a function pointer. The later
functionality is not being used but may provide a way to allow memory
hooks when dlopen functionality is disabled.

This commit adds two different flavors of code patching. The first is
provided by the overwrite component. This component overwrites the
first several instructions of the target function with code to jump to
the provided hook function. The hook is expected to provide the full
functionality of the hooked function.

The linux patcher component is based on the memory hooks in ucx. It
only works on linux and operates by overwriting function pointers in
the symbol table. In this case the hook is free to call the original
function using the function pointer returned by dlsym.

Both components restore the original functions when the patcher
framework closes.

Changes had to be made to support Power/PowerPC with the Linux
dynamic loader patcher. Some of the changes:

 - Move code necessary for powerpc/power support to the patcher
   base. The code is needed by both the overwrite and linux
   components.

 - Move patch structure down to base and move the patch list to
   mca_patcher_base_module_t. The structure has been modified to
   include a function pointer to the function that will unapply the
   patch. This allows the mixing of multiple different types of
   patches in the patch_list.

 - Update linux patching code to keep track of the matching between
   got entry and original (unpatched) address. This allows us to
   completely clean up the patch on finalize.

All patchers keep track of the changes they made so that they can be
reversed when the patcher framework is closed.

At this time there are bugs in the Linux dynamic loader patcher so
its priority is lower than the overwrite patcher.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-13 17:16:13 -06:00
Nathan Hjelm
b1670f844d contrib/platform: don't disable dlopen
The --enable-static gives us what we want: statically linked components.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-04-13 17:16:12 -06:00
Alina Sklarevich
6cd7282631 mellanox/optimized: set enable_openib_rdmacm_ibaddr=yes in the mellanox/optimized file. 2016-04-11 18:01:16 +03:00
Nathan Hjelm
147e780fa5 contrib/lanl: update platform files for TOSS2
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-03-17 14:30:50 -06:00
Mike Dubman
cdffe4f92d BUILD: update mellanox platform file
add support for UCX
2015-10-21 11:39:30 +03:00
Howard Pritchard
89b9be3732 lanl/platform: fixes to pick up lustre
Fixes to lanl platform files to pick up lustre header
files, etc. for romio and ompi i/o.

Fixes #1033

Thanks to Jerome Vienne for spotting this.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-10-15 14:32:21 -05:00
Ralph Castain
c1bbbb5e2f Remove the last involvement of the OOB system from the MPI layer, remove the no-longer-needed usock/oob component, and have procs no longer open the RML, OOB, ROUTED, and GRPCOMM frameworks as PMIx now provides all required app-mpirun cmds 2015-09-15 13:08:35 -07:00
Howard Pritchard
5eccba17af lanl: help out lanl admins
LANL admins want platform files and *.conf
files so oblige them.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-07-24 08:03:52 -07:00
Ralph Castain
75ceec663a Now that it has been officially released, update the embedded HWLOC to 1.11.0 2015-06-28 14:07:45 -07:00
Ralph Castain
869041f770 Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
Ralph Castain
9a70765f27 Silence malloc(0) warnings reported by Lisandro 2015-05-12 12:38:58 -07:00
Mike Dubman
dede6fa1fb build: new options
- enable/disable know for threads support
- disable rpath by default
2015-04-30 14:46:15 +03:00
Bert Wesarg
d01c5160df Remove any reference to VampirTrace in the platform files. 2015-01-22 08:08:08 +01:00