diff options
-rw-r--r-- | Documentation/x86_64/fake-numa-for-cpusets | 66 |
1 files changed, 66 insertions, 0 deletions
diff --git a/Documentation/x86_64/fake-numa-for-cpusets b/Documentation/x86_64/fake-numa-for-cpusets new file mode 100644 index 000000000000..d1a985c5b00a --- /dev/null +++ b/Documentation/x86_64/fake-numa-for-cpusets | |||
@@ -0,0 +1,66 @@ | |||
1 | Using numa=fake and CPUSets for Resource Management | ||
2 | Written by David Rientjes <rientjes@cs.washington.edu> | ||
3 | |||
4 | This document describes how the numa=fake x86_64 command-line option can be used | ||
5 | in conjunction with cpusets for coarse memory management. Using this feature, | ||
6 | you can create fake NUMA nodes that represent contiguous chunks of memory and | ||
7 | assign them to cpusets and their attached tasks. This is a way of limiting the | ||
8 | amount of system memory that are available to a certain class of tasks. | ||
9 | |||
10 | For more information on the features of cpusets, see Documentation/cpusets.txt. | ||
11 | There are a number of different configurations you can use for your needs. For | ||
12 | more information on the numa=fake command line option and its various ways of | ||
13 | configuring fake nodes, see Documentation/x86_64/boot-options.txt. | ||
14 | |||
15 | For the purposes of this introduction, we'll assume a very primitive NUMA | ||
16 | emulation setup of "numa=fake=4*512,". This will split our system memory into | ||
17 | four equal chunks of 512M each that we can now use to assign to cpusets. As | ||
18 | you become more familiar with using this combination for resource control, | ||
19 | you'll determine a better setup to minimize the number of nodes you have to deal | ||
20 | with. | ||
21 | |||
22 | A machine may be split as follows with "numa=fake=4*512," as reported by dmesg: | ||
23 | |||
24 | Faking node 0 at 0000000000000000-0000000020000000 (512MB) | ||
25 | Faking node 1 at 0000000020000000-0000000040000000 (512MB) | ||
26 | Faking node 2 at 0000000040000000-0000000060000000 (512MB) | ||
27 | Faking node 3 at 0000000060000000-0000000080000000 (512MB) | ||
28 | ... | ||
29 | On node 0 totalpages: 130975 | ||
30 | On node 1 totalpages: 131072 | ||
31 | On node 2 totalpages: 131072 | ||
32 | On node 3 totalpages: 131072 | ||
33 | |||
34 | Now following the instructions for mounting the cpusets filesystem from | ||
35 | Documentation/cpusets.txt, you can assign fake nodes (i.e. contiguous memory | ||
36 | address spaces) to individual cpusets: | ||
37 | |||
38 | [root@xroads /]# mkdir exampleset | ||
39 | [root@xroads /]# mount -t cpuset none exampleset | ||
40 | [root@xroads /]# mkdir exampleset/ddset | ||
41 | [root@xroads /]# cd exampleset/ddset | ||
42 | [root@xroads /exampleset/ddset]# echo 0-1 > cpus | ||
43 | [root@xroads /exampleset/ddset]# echo 0-1 > mems | ||
44 | |||
45 | Now this cpuset, 'ddset', will only allowed access to fake nodes 0 and 1 for | ||
46 | memory allocations (1G). | ||
47 | |||
48 | You can now assign tasks to these cpusets to limit the memory resources | ||
49 | available to them according to the fake nodes assigned as mems: | ||
50 | |||
51 | [root@xroads /exampleset/ddset]# echo $$ > tasks | ||
52 | [root@xroads /exampleset/ddset]# dd if=/dev/zero of=tmp bs=1024 count=1G | ||
53 | [1] 13425 | ||
54 | |||
55 | Notice the difference between the system memory usage as reported by | ||
56 | /proc/meminfo between the restricted cpuset case above and the unrestricted | ||
57 | case (i.e. running the same 'dd' command without assigning it to a fake NUMA | ||
58 | cpuset): | ||
59 | Unrestricted Restricted | ||
60 | MemTotal: 3091900 kB 3091900 kB | ||
61 | MemFree: 42113 kB 1513236 kB | ||
62 | |||
63 | This allows for coarse memory management for the tasks you assign to particular | ||
64 | cpusets. Since cpusets can form a hierarchy, you can create some pretty | ||
65 | interesting combinations of use-cases for various classes of tasks for your | ||
66 | memory management needs. | ||