Heterogeneous memory integration #3200

gcongiu · 2018-06-26T19:32:04Z

This PR introduces NUMA-awareness and heterogeneous memory support by adding the following features to MPICH:

Memory locality detection: detect NUMA nodes close to the hardware thread the current MPI process is running onto;
Shared memory partitioning: allocate a separate shared memory segment (VMA) for every NUMA node;
Memory binding setting: set memory binding for a shared memory segment to the requested memory node.

These three features combined allow MPICH to partition a shared memory object (e.g., fastboxes in point-to-point) by the number of NUMA nodes in the system. Processes will then have their memory objects bound to the closest NUMA.

REF: #3511

gcongiu · 2018-08-14T22:06:03Z

test:jenkins/ch3/most

gcongiu · 2018-08-15T02:59:52Z

test:jenkins/ch3/tcp

gcongiu · 2018-09-02T03:42:13Z

test:jenkins/ch3/ofi

gcongiu · 2018-09-04T15:11:16Z

test:jenkins/ch3/ofi

gcongiu · 2018-09-04T15:59:51Z

test:jenkins/ch3/ofi

gcongiu · 2018-10-04T15:44:39Z

test:jenkins/ch3/most

gcongiu · 2018-10-04T17:35:11Z

test:jenkins/ch3/most

gcongiu · 2019-02-12T02:06:04Z

test:jenkins/ch4/ofi

gcongiu · 2019-02-12T03:51:48Z

@raffenet @pavanbalaji CH4 tests now complete correctly (for some reason jenkins is failing to collect the summary and thus is not showing tests as passed). There are still a few corrections I need to make to the code but before doing it I would like to get your comments.

pavanbalaji · 2019-02-12T10:24:39Z

This exposes fastboxes and copy buffers to the MPI layer. The abstraction needs to be improved.

gcongiu · 2019-02-12T15:44:46Z

This exposes fastboxes and copy buffers to the MPI layer. The abstraction needs to be improved.

Should I move mpir_memkind.h to the common device layer instead (i.e., mpidu_memkind.h)? Concerning MPIR_Process is it ok to have additional numa_info data in there? Or should this be also moved to the device layer and reside in another global data structure?

pavanbalaji · 2019-02-12T15:48:47Z

Should I move mpir_memkind.h to the common device layer instead (i.e., mpidu_memkind.h)?

Not everything is device specific. So you might need to split it.

Concerning MPIR_Process is it ok to have additional numa_info data in there? Or should this be also moved to the device layer and reside in another global data structure?

Why not generalize it to expose topology in a more uniform fashion instead of hardcoding just "node" and "numa"? We now use hwloc at the MPI layer, so we might as well use those objects.

gcongiu · 2019-02-13T20:13:44Z

test:jenkins/ch4/ofi

gcongiu · 2019-02-13T22:02:00Z

test:jenkins/ch4/ofi

gcongiu · 2019-02-14T01:05:27Z

test:jenkins/ch4/ofi

gcongiu · 2019-02-14T02:30:23Z

Running additional tests as OFI works. @raffenet I still haven't figure out why report collection is failing.
test:jenkins/ch4/ucx
test:jenkins/ch3/tcp

raffenet · 2019-02-14T16:05:58Z

test/mpi/pt2pt/testlist.def

@@ -50,3 +50,11 @@ dtype_send 2
 recv_any 2
 irecv_any 2
 large_tag 2
+
+# Heterogeneous memory tests
+sendflood 8 env=MPIR_CVAR_MEMBIND_NUMA_ENABLE="YES" env=MPIR_CVAR_MEMBIND_TYPE_LIST="FASTBOXES:AUTO" env=MPIR_CVAR_MEMBIND_POLICY_LIST="FASTBOXES:BIND" env=MPIR_CVAR_MEMBIND_FLAGS_LIST="FASTBOXES:STRICT" timeLimit=600


It looks like these additional quotes are causing the XML parser to barf. Can you remove them? E.g.

sendflood 8 env=MPIR_CVAR_MEMBIND_NUMA_ENABLE=YES env=MPIR_CVAR_MEMBIND_TYPE_LIST=FASTBOXES:AUTO env=MPIR_CVAR_MEMBIND_POLICY_LIST=FASTBOXES:BIND env=MPIR_CVAR_MEMBIND_FLAGS_LIST=FASTBOXES:STRICT timeLimit=600

gcongiu · 2019-02-14T16:38:22Z

test:jenkins/ch4/ofi

This is a refactoring patch for shared memory segment allocation functions. 'MPIDU_shm_seg_alloc' now also takes a memory type parameter defining in which target memory the requested allocation should be placed. 'MPIDU_shm_seg_commit' now also takes an hwloc numa node logical identifier and a shared memory object identifier. The numa node id allows binding memory to a specific memory domain while the object identifier allows user defined binding information for that specific object. The patch also introduces shared memory object definitions in 'src/mpid/common/shm/mpidu_shm_obj.h', and memory type definitions to which objects can be bound to in 'src/include/mpir_memtype.h'.

This patch introduces support for numa architectures, including detection and usage of heterogeneous memory, e.g., KNL MCDRAM. The patch adds functionalities to detect numa nodes of different type and set up information useful for binding allocated objects to different types of memory.

This patch modifies the previous fbox segment allocation mechanism to make it numa and heterogeneous memory-aware. This is done by counting the number of available numa nodes used by MPI processes and creating an equal number of shared memory segments (instead of just one). Each of these segments will contain the fbox elements for the processes located in the corresponding numa node and can be bound to the requested type of memory (i.e., DDR or MCDRAM).

Similarly to pt2pt fastbox integration this patch decomposes current single shared segment into multiple segments, one per numa node, that can then be separately bound using hwloc. Moreover, when using symheap either all single segment allocations succeed or none of them does. If a symheap segment allocation fails all the previous should be reverted. In order to accomplish this the new function: `MPIDI_CH4R_release_shm_symheap` has been introduced.

gcongiu · 2019-02-15T03:31:15Z

test:jenkins/ch4/most
test:jenkins/ch3/most

gcongiu · 2019-02-15T05:21:36Z

@pavanbalaji @raffenet tests seem fine. I noticed a few minor things that I need to fix. However, if you want you can start looking at the code.

pavanbalaji · 2019-02-15T14:13:18Z

src/include/mpir_memtype.h

+    MPIR_MEMTYPE__DDR = 0,
+    MPIR_MEMTYPE__MCDRAM,
+    MPIR_MEMTYPE__NUM,
+    MPIR_MEMTYPE__DEFAULT = MPIR_MEMTYPE__DDR


Is that allowed in C99?

It should be allowed. The C99 standard does not say you can't do it (section 6.7.2.2):
The expression that defines the value of an enumeration constant shall be an integer constant expression that has a value representable as an int. [...] (The use of enumerators with = may produce enumeration constants with values that duplicate other values in the same enumeration.). My interpretation might be wrong though.

pavanbalaji · 2019-02-15T14:17:04Z

src/mpid/common/shm/mpidu_shm_obj.h

+    MPIDU_SHM_OBJ__COPYBUFS,
+    MPIDU_SHM_OBJ__WIN,
+    MPIDU_SHM_OBJ__NUM
+} MPIDU_shm_obj_t;


The memory objects (I'd call these buffer types, instead) are different from memory types. These two should be split into different commits. Memory types are usable even without explicit buffer types, i.e., we can have all buffers be allocated on DRAM or MCDRAM or something else. Buffer types only allow us to have some buffers on one memory type and other buffers on a different memory type.

Makes sense, I will split the two in different commits.

gcongiu · 2019-06-25T16:16:25Z

I am closing this PR as the code has to be completely rewritten and is cleaner to start over with another one.

gcongiu added the WIP label Jun 26, 2018

gcongiu force-pushed the hetero-mem branch 2 times, most recently from 9f8a50f to eb96e22 Compare July 18, 2018 14:14

gcongiu changed the title ~~mpi/init: add support to detect MCDRAM nodes on KNL architectures~~ Heterogeneous memory integration Jul 18, 2018

gcongiu force-pushed the hetero-mem branch 3 times, most recently from 1007b81 to 701ea10 Compare July 20, 2018 15:32

gcongiu force-pushed the hetero-mem branch 3 times, most recently from c838ac0 to 1cbbb67 Compare August 6, 2018 19:34

gcongiu force-pushed the hetero-mem branch from 11412c7 to 7a66231 Compare August 14, 2018 20:45

gcongiu force-pushed the hetero-mem branch from e40a7b2 to a2ce4b3 Compare August 15, 2018 00:14

gcongiu force-pushed the hetero-mem branch from 85ec113 to ed71eba Compare September 4, 2018 15:11

gcongiu force-pushed the hetero-mem branch from ed71eba to 1d6ca46 Compare September 26, 2018 21:38

gcongiu added this to the mpich-3.4a1 milestone Sep 26, 2018

gcongiu force-pushed the hetero-mem branch from 1d6ca46 to 350b7cd Compare October 4, 2018 15:42

gcongiu force-pushed the hetero-mem branch 2 times, most recently from 0ae6922 to 764014c Compare October 4, 2018 17:34

gcongiu force-pushed the hetero-mem branch 5 times, most recently from 8c6b5ea to 2fa570b Compare October 5, 2018 20:05

gcongiu force-pushed the hetero-mem branch from d0cc79c to 974f641 Compare February 12, 2019 02:05

gcongiu force-pushed the hetero-mem branch from 974f641 to 534d15a Compare February 13, 2019 20:09

gcongiu force-pushed the hetero-mem branch from 534d15a to d5f20dd Compare February 13, 2019 23:44

raffenet reviewed Feb 14, 2019

View reviewed changes

gcongiu and others added 7 commits February 14, 2019 19:54

shm: add binding for shared segment allocations

d8dfc40

test/mpi: add pt2pt tests for heterogeneous memory

1062cfb

test/mpi: add rma tests for heterogeneous memory

81c24b1

gcongiu force-pushed the hetero-mem branch from cf80ee0 to 81c24b1 Compare February 15, 2019 03:29

pavanbalaji suggested changes Feb 15, 2019

View reviewed changes

gcongiu mentioned this pull request Feb 24, 2019

mpir: introduce hardware topology abstraction layer #3594

Merged

gcongiu closed this Jun 25, 2019

gcongiu mentioned this pull request Jul 5, 2019

Heterogeneous Memory Integration for CH4 Fastboxes #3916

Closed

8 tasks

gcongiu deleted the hetero-mem branch December 8, 2019 03:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Heterogeneous memory integration #3200

Heterogeneous memory integration #3200

gcongiu commented Jun 26, 2018 •

edited

Loading

gcongiu commented Aug 14, 2018

gcongiu commented Aug 15, 2018

gcongiu commented Sep 2, 2018

gcongiu commented Sep 4, 2018

gcongiu commented Sep 4, 2018

gcongiu commented Oct 4, 2018

gcongiu commented Oct 4, 2018

gcongiu commented Feb 12, 2019

gcongiu commented Feb 12, 2019

pavanbalaji commented Feb 12, 2019

gcongiu commented Feb 12, 2019

pavanbalaji commented Feb 12, 2019

gcongiu commented Feb 13, 2019

gcongiu commented Feb 13, 2019

gcongiu commented Feb 14, 2019

gcongiu commented Feb 14, 2019

raffenet Feb 14, 2019 •

edited

Loading

gcongiu commented Feb 14, 2019

gcongiu commented Feb 15, 2019

gcongiu commented Feb 15, 2019

pavanbalaji Feb 15, 2019

gcongiu Feb 15, 2019

pavanbalaji Feb 15, 2019

gcongiu Feb 15, 2019

gcongiu commented Jun 25, 2019

Heterogeneous memory integration #3200

Heterogeneous memory integration #3200

Conversation

gcongiu commented Jun 26, 2018 • edited Loading

gcongiu commented Aug 14, 2018

gcongiu commented Aug 15, 2018

gcongiu commented Sep 2, 2018

gcongiu commented Sep 4, 2018

gcongiu commented Sep 4, 2018

gcongiu commented Oct 4, 2018

gcongiu commented Oct 4, 2018

gcongiu commented Feb 12, 2019

gcongiu commented Feb 12, 2019

pavanbalaji commented Feb 12, 2019

gcongiu commented Feb 12, 2019

pavanbalaji commented Feb 12, 2019

gcongiu commented Feb 13, 2019

gcongiu commented Feb 13, 2019

gcongiu commented Feb 14, 2019

gcongiu commented Feb 14, 2019

raffenet Feb 14, 2019 • edited Loading

Choose a reason for hiding this comment

gcongiu commented Feb 14, 2019

gcongiu commented Feb 15, 2019

gcongiu commented Feb 15, 2019

pavanbalaji Feb 15, 2019

Choose a reason for hiding this comment

gcongiu Feb 15, 2019

Choose a reason for hiding this comment

pavanbalaji Feb 15, 2019

Choose a reason for hiding this comment

gcongiu Feb 15, 2019

Choose a reason for hiding this comment

gcongiu commented Jun 25, 2019

gcongiu commented Jun 26, 2018 •

edited

Loading

raffenet Feb 14, 2019 •

edited

Loading