summaryrefslogtreecommitdiffstats
path: root/Documentation/x86
diff options
context:
space:
mode:
authorChangbin Du <changbin.du@gmail.com>2019-05-08 11:21:31 -0400
committerJonathan Corbet <corbet@lwn.net>2019-05-08 16:34:11 -0400
commit1cd7af509dc223905dce622c07ec62e26044e3c0 (patch)
tree9bfae71e7c0b5248634f08b60f8616572a3f00f3 /Documentation/x86
parent3d07bc393f9b63ca4c6f9953922f9122a11f29c3 (diff)
Documentation: x86: convert resctrl_ui.txt to reST
This converts the plain text documentation to reStructuredText format and add it to Sphinx TOC tree. No essential content change. Signed-off-by: Changbin Du <changbin.du@gmail.com> Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Diffstat (limited to 'Documentation/x86')
-rw-r--r--Documentation/x86/index.rst1
-rw-r--r--Documentation/x86/resctrl_ui.rst (renamed from Documentation/x86/resctrl_ui.txt)916
2 files changed, 494 insertions, 423 deletions
diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
index ae29c026be72..6e3c887a0c3b 100644
--- a/Documentation/x86/index.rst
+++ b/Documentation/x86/index.rst
@@ -23,3 +23,4 @@ x86-specific Documentation
23 amd-memory-encryption 23 amd-memory-encryption
24 pti 24 pti
25 microcode 25 microcode
26 resctrl_ui
diff --git a/Documentation/x86/resctrl_ui.txt b/Documentation/x86/resctrl_ui.rst
index c1f95b59e14d..225cfd4daaee 100644
--- a/Documentation/x86/resctrl_ui.txt
+++ b/Documentation/x86/resctrl_ui.rst
@@ -1,33 +1,44 @@
1.. SPDX-License-Identifier: GPL-2.0
2.. include:: <isonum.txt>
3
4===========================================
1User Interface for Resource Control feature 5User Interface for Resource Control feature
6===========================================
2 7
3Intel refers to this feature as Intel Resource Director Technology(Intel(R) RDT). 8:Copyright: |copy| 2016 Intel Corporation
4AMD refers to this feature as AMD Platform Quality of Service(AMD QoS). 9:Authors: - Fenghua Yu <fenghua.yu@intel.com>
10 - Tony Luck <tony.luck@intel.com>
11 - Vikas Shivappa <vikas.shivappa@intel.com>
5 12
6Copyright (C) 2016 Intel Corporation
7 13
8Fenghua Yu <fenghua.yu@intel.com> 14Intel refers to this feature as Intel Resource Director Technology(Intel(R) RDT).
9Tony Luck <tony.luck@intel.com> 15AMD refers to this feature as AMD Platform Quality of Service(AMD QoS).
10Vikas Shivappa <vikas.shivappa@intel.com>
11 16
12This feature is enabled by the CONFIG_X86_CPU_RESCTRL and the x86 /proc/cpuinfo 17This feature is enabled by the CONFIG_X86_CPU_RESCTRL and the x86 /proc/cpuinfo
13flag bits: 18flag bits:
14RDT (Resource Director Technology) Allocation - "rdt_a"
15CAT (Cache Allocation Technology) - "cat_l3", "cat_l2"
16CDP (Code and Data Prioritization ) - "cdp_l3", "cdp_l2"
17CQM (Cache QoS Monitoring) - "cqm_llc", "cqm_occup_llc"
18MBM (Memory Bandwidth Monitoring) - "cqm_mbm_total", "cqm_mbm_local"
19MBA (Memory Bandwidth Allocation) - "mba"
20 19
21To use the feature mount the file system: 20============================================= ================================
21RDT (Resource Director Technology) Allocation "rdt_a"
22CAT (Cache Allocation Technology) "cat_l3", "cat_l2"
23CDP (Code and Data Prioritization) "cdp_l3", "cdp_l2"
24CQM (Cache QoS Monitoring) "cqm_llc", "cqm_occup_llc"
25MBM (Memory Bandwidth Monitoring) "cqm_mbm_total", "cqm_mbm_local"
26MBA (Memory Bandwidth Allocation) "mba"
27============================================= ================================
28
29To use the feature mount the file system::
22 30
23 # mount -t resctrl resctrl [-o cdp[,cdpl2][,mba_MBps]] /sys/fs/resctrl 31 # mount -t resctrl resctrl [-o cdp[,cdpl2][,mba_MBps]] /sys/fs/resctrl
24 32
25mount options are: 33mount options are:
26 34
27"cdp": Enable code/data prioritization in L3 cache allocations. 35"cdp":
28"cdpl2": Enable code/data prioritization in L2 cache allocations. 36 Enable code/data prioritization in L3 cache allocations.
29"mba_MBps": Enable the MBA Software Controller(mba_sc) to specify MBA 37"cdpl2":
30 bandwidth in MBps 38 Enable code/data prioritization in L2 cache allocations.
39"mba_MBps":
40 Enable the MBA Software Controller(mba_sc) to specify MBA
41 bandwidth in MBps
31 42
32L2 and L3 CDP are controlled seperately. 43L2 and L3 CDP are controlled seperately.
33 44
@@ -44,7 +55,7 @@ For more details on the behavior of the interface during monitoring
44and allocation, see the "Resource alloc and monitor groups" section. 55and allocation, see the "Resource alloc and monitor groups" section.
45 56
46Info directory 57Info directory
47-------------- 58==============
48 59
49The 'info' directory contains information about the enabled 60The 'info' directory contains information about the enabled
50resources. Each resource has its own subdirectory. The subdirectory 61resources. Each resource has its own subdirectory. The subdirectory
@@ -56,77 +67,93 @@ allocation:
56Cache resource(L3/L2) subdirectory contains the following files 67Cache resource(L3/L2) subdirectory contains the following files
57related to allocation: 68related to allocation:
58 69
59"num_closids": The number of CLOSIDs which are valid for this 70"num_closids":
60 resource. The kernel uses the smallest number of 71 The number of CLOSIDs which are valid for this
61 CLOSIDs of all enabled resources as limit. 72 resource. The kernel uses the smallest number of
62 73 CLOSIDs of all enabled resources as limit.
63"cbm_mask": The bitmask which is valid for this resource. 74"cbm_mask":
64 This mask is equivalent to 100%. 75 The bitmask which is valid for this resource.
65 76 This mask is equivalent to 100%.
66"min_cbm_bits": The minimum number of consecutive bits which 77"min_cbm_bits":
67 must be set when writing a mask. 78 The minimum number of consecutive bits which
68 79 must be set when writing a mask.
69"shareable_bits": Bitmask of shareable resource with other executing 80
70 entities (e.g. I/O). User can use this when 81"shareable_bits":
71 setting up exclusive cache partitions. Note that 82 Bitmask of shareable resource with other executing
72 some platforms support devices that have their 83 entities (e.g. I/O). User can use this when
73 own settings for cache use which can over-ride 84 setting up exclusive cache partitions. Note that
74 these bits. 85 some platforms support devices that have their
75"bit_usage": Annotated capacity bitmasks showing how all 86 own settings for cache use which can over-ride
76 instances of the resource are used. The legend is: 87 these bits.
77 "0" - Corresponding region is unused. When the system's 88"bit_usage":
89 Annotated capacity bitmasks showing how all
90 instances of the resource are used. The legend is:
91
92 "0":
93 Corresponding region is unused. When the system's
78 resources have been allocated and a "0" is found 94 resources have been allocated and a "0" is found
79 in "bit_usage" it is a sign that resources are 95 in "bit_usage" it is a sign that resources are
80 wasted. 96 wasted.
81 "H" - Corresponding region is used by hardware only 97
98 "H":
99 Corresponding region is used by hardware only
82 but available for software use. If a resource 100 but available for software use. If a resource
83 has bits set in "shareable_bits" but not all 101 has bits set in "shareable_bits" but not all
84 of these bits appear in the resource groups' 102 of these bits appear in the resource groups'
85 schematas then the bits appearing in 103 schematas then the bits appearing in
86 "shareable_bits" but no resource group will 104 "shareable_bits" but no resource group will
87 be marked as "H". 105 be marked as "H".
88 "X" - Corresponding region is available for sharing and 106 "X":
107 Corresponding region is available for sharing and
89 used by hardware and software. These are the 108 used by hardware and software. These are the
90 bits that appear in "shareable_bits" as 109 bits that appear in "shareable_bits" as
91 well as a resource group's allocation. 110 well as a resource group's allocation.
92 "S" - Corresponding region is used by software 111 "S":
112 Corresponding region is used by software
93 and available for sharing. 113 and available for sharing.
94 "E" - Corresponding region is used exclusively by 114 "E":
115 Corresponding region is used exclusively by
95 one resource group. No sharing allowed. 116 one resource group. No sharing allowed.
96 "P" - Corresponding region is pseudo-locked. No 117 "P":
118 Corresponding region is pseudo-locked. No
97 sharing allowed. 119 sharing allowed.
98 120
99Memory bandwitdh(MB) subdirectory contains the following files 121Memory bandwitdh(MB) subdirectory contains the following files
100with respect to allocation: 122with respect to allocation:
101 123
102"min_bandwidth": The minimum memory bandwidth percentage which 124"min_bandwidth":
103 user can request. 125 The minimum memory bandwidth percentage which
126 user can request.
104 127
105"bandwidth_gran": The granularity in which the memory bandwidth 128"bandwidth_gran":
106 percentage is allocated. The allocated 129 The granularity in which the memory bandwidth
107 b/w percentage is rounded off to the next 130 percentage is allocated. The allocated
108 control step available on the hardware. The 131 b/w percentage is rounded off to the next
109 available bandwidth control steps are: 132 control step available on the hardware. The
110 min_bandwidth + N * bandwidth_gran. 133 available bandwidth control steps are:
134 min_bandwidth + N * bandwidth_gran.
111 135
112"delay_linear": Indicates if the delay scale is linear or 136"delay_linear":
113 non-linear. This field is purely informational 137 Indicates if the delay scale is linear or
114 only. 138 non-linear. This field is purely informational
139 only.
115 140
116If RDT monitoring is available there will be an "L3_MON" directory 141If RDT monitoring is available there will be an "L3_MON" directory
117with the following files: 142with the following files:
118 143
119"num_rmids": The number of RMIDs available. This is the 144"num_rmids":
120 upper bound for how many "CTRL_MON" + "MON" 145 The number of RMIDs available. This is the
121 groups can be created. 146 upper bound for how many "CTRL_MON" + "MON"
147 groups can be created.
122 148
123"mon_features": Lists the monitoring events if 149"mon_features":
124 monitoring is enabled for the resource. 150 Lists the monitoring events if
151 monitoring is enabled for the resource.
125 152
126"max_threshold_occupancy": 153"max_threshold_occupancy":
127 Read/write file provides the largest value (in 154 Read/write file provides the largest value (in
128 bytes) at which a previously used LLC_occupancy 155 bytes) at which a previously used LLC_occupancy
129 counter can be considered for re-use. 156 counter can be considered for re-use.
130 157
131Finally, in the top level of the "info" directory there is a file 158Finally, in the top level of the "info" directory there is a file
132named "last_cmd_status". This is reset with every "command" issued 159named "last_cmd_status". This is reset with every "command" issued
@@ -134,6 +161,7 @@ via the file system (making new directories or writing to any of the
134control files). If the command was successful, it will read as "ok". 161control files). If the command was successful, it will read as "ok".
135If the command failed, it will provide more information that can be 162If the command failed, it will provide more information that can be
136conveyed in the error returns from file operations. E.g. 163conveyed in the error returns from file operations. E.g.
164::
137 165
138 # echo L3:0=f7 > schemata 166 # echo L3:0=f7 > schemata
139 bash: echo: write error: Invalid argument 167 bash: echo: write error: Invalid argument
@@ -141,7 +169,7 @@ conveyed in the error returns from file operations. E.g.
141 mask f7 has non-consecutive 1-bits 169 mask f7 has non-consecutive 1-bits
142 170
143Resource alloc and monitor groups 171Resource alloc and monitor groups
144--------------------------------- 172=================================
145 173
146Resource groups are represented as directories in the resctrl file 174Resource groups are represented as directories in the resctrl file
147system. The default group is the root directory which, immediately 175system. The default group is the root directory which, immediately
@@ -226,6 +254,7 @@ When monitoring is enabled all MON groups will also contain:
226 254
227Resource allocation rules 255Resource allocation rules
228------------------------- 256-------------------------
257
229When a task is running the following rules define which resources are 258When a task is running the following rules define which resources are
230available to it: 259available to it:
231 260
@@ -252,7 +281,7 @@ Resource monitoring rules
252 281
253 282
254Notes on cache occupancy monitoring and control 283Notes on cache occupancy monitoring and control
255----------------------------------------------- 284===============================================
256When moving a task from one group to another you should remember that 285When moving a task from one group to another you should remember that
257this only affects *new* cache allocations by the task. E.g. you may have 286this only affects *new* cache allocations by the task. E.g. you may have
258a task in a monitor group showing 3 MB of cache occupancy. If you move 287a task in a monitor group showing 3 MB of cache occupancy. If you move
@@ -321,7 +350,7 @@ of the capacity of the cache. You could partition the cache into four
321equal parts with masks: 0x1f, 0x3e0, 0x7c00, 0xf8000. 350equal parts with masks: 0x1f, 0x3e0, 0x7c00, 0xf8000.
322 351
323Memory bandwidth Allocation and monitoring 352Memory bandwidth Allocation and monitoring
324------------------------------------------ 353==========================================
325 354
326For Memory bandwidth resource, by default the user controls the resource 355For Memory bandwidth resource, by default the user controls the resource
327by indicating the percentage of total memory bandwidth. 356by indicating the percentage of total memory bandwidth.
@@ -369,7 +398,7 @@ In order to mitigate this and make the interface more user friendly,
369resctrl added support for specifying the bandwidth in MBps as well. The 398resctrl added support for specifying the bandwidth in MBps as well. The
370kernel underneath would use a software feedback mechanism or a "Software 399kernel underneath would use a software feedback mechanism or a "Software
371Controller(mba_sc)" which reads the actual bandwidth using MBM counters 400Controller(mba_sc)" which reads the actual bandwidth using MBM counters
372and adjust the memowy bandwidth percentages to ensure 401and adjust the memowy bandwidth percentages to ensure::
373 402
374 "actual bandwidth < user specified bandwidth". 403 "actual bandwidth < user specified bandwidth".
375 404
@@ -380,14 +409,14 @@ sections.
380 409
381L3 schemata file details (code and data prioritization disabled) 410L3 schemata file details (code and data prioritization disabled)
382---------------------------------------------------------------- 411----------------------------------------------------------------
383With CDP disabled the L3 schemata format is: 412With CDP disabled the L3 schemata format is::
384 413
385 L3:<cache_id0>=<cbm>;<cache_id1>=<cbm>;... 414 L3:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
386 415
387L3 schemata file details (CDP enabled via mount option to resctrl) 416L3 schemata file details (CDP enabled via mount option to resctrl)
388------------------------------------------------------------------ 417------------------------------------------------------------------
389When CDP is enabled L3 control is split into two separate resources 418When CDP is enabled L3 control is split into two separate resources
390so you can specify independent masks for code and data like this: 419so you can specify independent masks for code and data like this::
391 420
392 L3data:<cache_id0>=<cbm>;<cache_id1>=<cbm>;... 421 L3data:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
393 L3code:<cache_id0>=<cbm>;<cache_id1>=<cbm>;... 422 L3code:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
@@ -395,7 +424,7 @@ so you can specify independent masks for code and data like this:
395L2 schemata file details 424L2 schemata file details
396------------------------ 425------------------------
397L2 cache does not support code and data prioritization, so the 426L2 cache does not support code and data prioritization, so the
398schemata format is always: 427schemata format is always::
399 428
400 L2:<cache_id0>=<cbm>;<cache_id1>=<cbm>;... 429 L2:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
401 430
@@ -403,6 +432,7 @@ Memory bandwidth Allocation (default mode)
403------------------------------------------ 432------------------------------------------
404 433
405Memory b/w domain is L3 cache. 434Memory b/w domain is L3 cache.
435::
406 436
407 MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;... 437 MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;...
408 438
@@ -410,6 +440,7 @@ Memory bandwidth Allocation specified in MBps
410--------------------------------------------- 440---------------------------------------------
411 441
412Memory bandwidth domain is L3 cache. 442Memory bandwidth domain is L3 cache.
443::
413 444
414 MB:<cache_id0>=bw_MBps0;<cache_id1>=bw_MBps1;... 445 MB:<cache_id0>=bw_MBps0;<cache_id1>=bw_MBps1;...
415 446
@@ -418,17 +449,18 @@ Reading/writing the schemata file
418Reading the schemata file will show the state of all resources 449Reading the schemata file will show the state of all resources
419on all domains. When writing you only need to specify those values 450on all domains. When writing you only need to specify those values
420which you wish to change. E.g. 451which you wish to change. E.g.
452::
421 453
422# cat schemata 454 # cat schemata
423L3DATA:0=fffff;1=fffff;2=fffff;3=fffff 455 L3DATA:0=fffff;1=fffff;2=fffff;3=fffff
424L3CODE:0=fffff;1=fffff;2=fffff;3=fffff 456 L3CODE:0=fffff;1=fffff;2=fffff;3=fffff
425# echo "L3DATA:2=3c0;" > schemata 457 # echo "L3DATA:2=3c0;" > schemata
426# cat schemata 458 # cat schemata
427L3DATA:0=fffff;1=fffff;2=3c0;3=fffff 459 L3DATA:0=fffff;1=fffff;2=3c0;3=fffff
428L3CODE:0=fffff;1=fffff;2=fffff;3=fffff 460 L3CODE:0=fffff;1=fffff;2=fffff;3=fffff
429 461
430Cache Pseudo-Locking 462Cache Pseudo-Locking
431-------------------- 463====================
432CAT enables a user to specify the amount of cache space that an 464CAT enables a user to specify the amount of cache space that an
433application can fill. Cache pseudo-locking builds on the fact that a 465application can fill. Cache pseudo-locking builds on the fact that a
434CPU can still read and write data pre-allocated outside its current 466CPU can still read and write data pre-allocated outside its current
@@ -442,6 +474,7 @@ a region of memory with reduced average read latency.
442The creation of a cache pseudo-locked region is triggered by a request 474The creation of a cache pseudo-locked region is triggered by a request
443from the user to do so that is accompanied by a schemata of the region 475from the user to do so that is accompanied by a schemata of the region
444to be pseudo-locked. The cache pseudo-locked region is created as follows: 476to be pseudo-locked. The cache pseudo-locked region is created as follows:
477
445- Create a CAT allocation CLOSNEW with a CBM matching the schemata 478- Create a CAT allocation CLOSNEW with a CBM matching the schemata
446 from the user of the cache region that will contain the pseudo-locked 479 from the user of the cache region that will contain the pseudo-locked
447 memory. This region must not overlap with any current CAT allocation/CLOS 480 memory. This region must not overlap with any current CAT allocation/CLOS
@@ -480,6 +513,7 @@ initial mmap() handling, there is no enforcement afterwards and the
480application self needs to ensure it remains affine to the correct cores. 513application self needs to ensure it remains affine to the correct cores.
481 514
482Pseudo-locking is accomplished in two stages: 515Pseudo-locking is accomplished in two stages:
516
4831) During the first stage the system administrator allocates a portion 5171) During the first stage the system administrator allocates a portion
484 of cache that should be dedicated to pseudo-locking. At this time an 518 of cache that should be dedicated to pseudo-locking. At this time an
485 equivalent portion of memory is allocated, loaded into allocated 519 equivalent portion of memory is allocated, loaded into allocated
@@ -506,7 +540,7 @@ by user space in order to obtain access to the pseudo-locked memory region.
506An example of cache pseudo-locked region creation and usage can be found below. 540An example of cache pseudo-locked region creation and usage can be found below.
507 541
508Cache Pseudo-Locking Debugging Interface 542Cache Pseudo-Locking Debugging Interface
509--------------------------------------- 543----------------------------------------
510The pseudo-locking debugging interface is enabled by default (if 544The pseudo-locking debugging interface is enabled by default (if
511CONFIG_DEBUG_FS is enabled) and can be found in /sys/kernel/debug/resctrl. 545CONFIG_DEBUG_FS is enabled) and can be found in /sys/kernel/debug/resctrl.
512 546
@@ -514,6 +548,7 @@ There is no explicit way for the kernel to test if a provided memory
514location is present in the cache. The pseudo-locking debugging interface uses 548location is present in the cache. The pseudo-locking debugging interface uses
515the tracing infrastructure to provide two ways to measure cache residency of 549the tracing infrastructure to provide two ways to measure cache residency of
516the pseudo-locked region: 550the pseudo-locked region:
551
5171) Memory access latency using the pseudo_lock_mem_latency tracepoint. Data 5521) Memory access latency using the pseudo_lock_mem_latency tracepoint. Data
518 from these measurements are best visualized using a hist trigger (see 553 from these measurements are best visualized using a hist trigger (see
519 example below). In this test the pseudo-locked region is traversed at 554 example below). In this test the pseudo-locked region is traversed at
@@ -529,87 +564,97 @@ it in debugfs as /sys/kernel/debug/resctrl/<newdir>. A single
529write-only file, pseudo_lock_measure, is present in this directory. The 564write-only file, pseudo_lock_measure, is present in this directory. The
530measurement of the pseudo-locked region depends on the number written to this 565measurement of the pseudo-locked region depends on the number written to this
531debugfs file: 566debugfs file:
5321 - writing "1" to the pseudo_lock_measure file will trigger the latency 567
5681:
569 writing "1" to the pseudo_lock_measure file will trigger the latency
533 measurement captured in the pseudo_lock_mem_latency tracepoint. See 570 measurement captured in the pseudo_lock_mem_latency tracepoint. See
534 example below. 571 example below.
5352 - writing "2" to the pseudo_lock_measure file will trigger the L2 cache 5722:
573 writing "2" to the pseudo_lock_measure file will trigger the L2 cache
536 residency (cache hits and misses) measurement captured in the 574 residency (cache hits and misses) measurement captured in the
537 pseudo_lock_l2 tracepoint. See example below. 575 pseudo_lock_l2 tracepoint. See example below.
5383 - writing "3" to the pseudo_lock_measure file will trigger the L3 cache 5763:
577 writing "3" to the pseudo_lock_measure file will trigger the L3 cache
539 residency (cache hits and misses) measurement captured in the 578 residency (cache hits and misses) measurement captured in the
540 pseudo_lock_l3 tracepoint. 579 pseudo_lock_l3 tracepoint.
541 580
542All measurements are recorded with the tracing infrastructure. This requires 581All measurements are recorded with the tracing infrastructure. This requires
543the relevant tracepoints to be enabled before the measurement is triggered. 582the relevant tracepoints to be enabled before the measurement is triggered.
544 583
545Example of latency debugging interface: 584Example of latency debugging interface
585~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
546In this example a pseudo-locked region named "newlock" was created. Here is 586In this example a pseudo-locked region named "newlock" was created. Here is
547how we can measure the latency in cycles of reading from this region and 587how we can measure the latency in cycles of reading from this region and
548visualize this data with a histogram that is available if CONFIG_HIST_TRIGGERS 588visualize this data with a histogram that is available if CONFIG_HIST_TRIGGERS
549is set: 589is set::
550# :> /sys/kernel/debug/tracing/trace 590
551# echo 'hist:keys=latency' > /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_mem_latency/trigger 591 # :> /sys/kernel/debug/tracing/trace
552# echo 1 > /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_mem_latency/enable 592 # echo 'hist:keys=latency' > /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_mem_latency/trigger
553# echo 1 > /sys/kernel/debug/resctrl/newlock/pseudo_lock_measure 593 # echo 1 > /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_mem_latency/enable
554# echo 0 > /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_mem_latency/enable 594 # echo 1 > /sys/kernel/debug/resctrl/newlock/pseudo_lock_measure
555# cat /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_mem_latency/hist 595 # echo 0 > /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_mem_latency/enable
556 596 # cat /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_mem_latency/hist
557# event histogram 597
558# 598 # event histogram
559# trigger info: hist:keys=latency:vals=hitcount:sort=hitcount:size=2048 [active] 599 #
560# 600 # trigger info: hist:keys=latency:vals=hitcount:sort=hitcount:size=2048 [active]
561 601 #
562{ latency: 456 } hitcount: 1 602
563{ latency: 50 } hitcount: 83 603 { latency: 456 } hitcount: 1
564{ latency: 36 } hitcount: 96 604 { latency: 50 } hitcount: 83
565{ latency: 44 } hitcount: 174 605 { latency: 36 } hitcount: 96
566{ latency: 48 } hitcount: 195 606 { latency: 44 } hitcount: 174
567{ latency: 46 } hitcount: 262 607 { latency: 48 } hitcount: 195
568{ latency: 42 } hitcount: 693 608 { latency: 46 } hitcount: 262
569{ latency: 40 } hitcount: 3204 609 { latency: 42 } hitcount: 693
570{ latency: 38 } hitcount: 3484 610 { latency: 40 } hitcount: 3204
571 611 { latency: 38 } hitcount: 3484
572Totals: 612
573 Hits: 8192 613 Totals:
574 Entries: 9 614 Hits: 8192
575 Dropped: 0 615 Entries: 9
576 616 Dropped: 0
577Example of cache hits/misses debugging: 617
618Example of cache hits/misses debugging
619~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
578In this example a pseudo-locked region named "newlock" was created on the L2 620In this example a pseudo-locked region named "newlock" was created on the L2
579cache of a platform. Here is how we can obtain details of the cache hits 621cache of a platform. Here is how we can obtain details of the cache hits
580and misses using the platform's precision counters. 622and misses using the platform's precision counters.
623::
581 624
582# :> /sys/kernel/debug/tracing/trace 625 # :> /sys/kernel/debug/tracing/trace
583# echo 1 > /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_l2/enable 626 # echo 1 > /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_l2/enable
584# echo 2 > /sys/kernel/debug/resctrl/newlock/pseudo_lock_measure 627 # echo 2 > /sys/kernel/debug/resctrl/newlock/pseudo_lock_measure
585# echo 0 > /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_l2/enable 628 # echo 0 > /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_l2/enable
586# cat /sys/kernel/debug/tracing/trace 629 # cat /sys/kernel/debug/tracing/trace
587 630
588# tracer: nop 631 # tracer: nop
589# 632 #
590# _-----=> irqs-off 633 # _-----=> irqs-off
591# / _----=> need-resched 634 # / _----=> need-resched
592# | / _---=> hardirq/softirq 635 # | / _---=> hardirq/softirq
593# || / _--=> preempt-depth 636 # || / _--=> preempt-depth
594# ||| / delay 637 # ||| / delay
595# TASK-PID CPU# |||| TIMESTAMP FUNCTION 638 # TASK-PID CPU# |||| TIMESTAMP FUNCTION
596# | | | |||| | | 639 # | | | |||| | |
597 pseudo_lock_mea-1672 [002] .... 3132.860500: pseudo_lock_l2: hits=4097 miss=0 640 pseudo_lock_mea-1672 [002] .... 3132.860500: pseudo_lock_l2: hits=4097 miss=0
598 641
599 642
600Examples for RDT allocation usage: 643Examples for RDT allocation usage
644~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
645
6461) Example 1
601 647
602Example 1
603---------
604On a two socket machine (one L3 cache per socket) with just four bits 648On a two socket machine (one L3 cache per socket) with just four bits
605for cache bit masks, minimum b/w of 10% with a memory bandwidth 649for cache bit masks, minimum b/w of 10% with a memory bandwidth
606granularity of 10% 650granularity of 10%.
651::
607 652
608# mount -t resctrl resctrl /sys/fs/resctrl 653 # mount -t resctrl resctrl /sys/fs/resctrl
609# cd /sys/fs/resctrl 654 # cd /sys/fs/resctrl
610# mkdir p0 p1 655 # mkdir p0 p1
611# echo "L3:0=3;1=c\nMB:0=50;1=50" > /sys/fs/resctrl/p0/schemata 656 # echo "L3:0=3;1=c\nMB:0=50;1=50" > /sys/fs/resctrl/p0/schemata
612# echo "L3:0=3;1=3\nMB:0=50;1=50" > /sys/fs/resctrl/p1/schemata 657 # echo "L3:0=3;1=3\nMB:0=50;1=50" > /sys/fs/resctrl/p1/schemata
613 658
614The default resource group is unmodified, so we have access to all parts 659The default resource group is unmodified, so we have access to all parts
615of all caches (its schemata file reads "L3:0=f;1=f"). 660of all caches (its schemata file reads "L3:0=f;1=f").
@@ -628,100 +673,106 @@ the b/w accordingly.
628 673
629If the MBA is specified in MB(megabytes) then user can enter the max b/w in MB 674If the MBA is specified in MB(megabytes) then user can enter the max b/w in MB
630rather than the percentage values. 675rather than the percentage values.
676::
631 677
632# echo "L3:0=3;1=c\nMB:0=1024;1=500" > /sys/fs/resctrl/p0/schemata 678 # echo "L3:0=3;1=c\nMB:0=1024;1=500" > /sys/fs/resctrl/p0/schemata
633# echo "L3:0=3;1=3\nMB:0=1024;1=500" > /sys/fs/resctrl/p1/schemata 679 # echo "L3:0=3;1=3\nMB:0=1024;1=500" > /sys/fs/resctrl/p1/schemata
634 680
635In the above example the tasks in "p1" and "p0" on socket 0 would use a max b/w 681In the above example the tasks in "p1" and "p0" on socket 0 would use a max b/w
636of 1024MB where as on socket 1 they would use 500MB. 682of 1024MB where as on socket 1 they would use 500MB.
637 683
638Example 2 6842) Example 2
639--------- 685
640Again two sockets, but this time with a more realistic 20-bit mask. 686Again two sockets, but this time with a more realistic 20-bit mask.
641 687
642Two real time tasks pid=1234 running on processor 0 and pid=5678 running on 688Two real time tasks pid=1234 running on processor 0 and pid=5678 running on
643processor 1 on socket 0 on a 2-socket and dual core machine. To avoid noisy 689processor 1 on socket 0 on a 2-socket and dual core machine. To avoid noisy
644neighbors, each of the two real-time tasks exclusively occupies one quarter 690neighbors, each of the two real-time tasks exclusively occupies one quarter
645of L3 cache on socket 0. 691of L3 cache on socket 0.
692::
646 693
647# mount -t resctrl resctrl /sys/fs/resctrl 694 # mount -t resctrl resctrl /sys/fs/resctrl
648# cd /sys/fs/resctrl 695 # cd /sys/fs/resctrl
649 696
650First we reset the schemata for the default group so that the "upper" 697First we reset the schemata for the default group so that the "upper"
65150% of the L3 cache on socket 0 and 50% of memory b/w cannot be used by 69850% of the L3 cache on socket 0 and 50% of memory b/w cannot be used by
652ordinary tasks: 699ordinary tasks::
653 700
654# echo "L3:0=3ff;1=fffff\nMB:0=50;1=100" > schemata 701 # echo "L3:0=3ff;1=fffff\nMB:0=50;1=100" > schemata
655 702
656Next we make a resource group for our first real time task and give 703Next we make a resource group for our first real time task and give
657it access to the "top" 25% of the cache on socket 0. 704it access to the "top" 25% of the cache on socket 0.
705::
658 706
659# mkdir p0 707 # mkdir p0
660# echo "L3:0=f8000;1=fffff" > p0/schemata 708 # echo "L3:0=f8000;1=fffff" > p0/schemata
661 709
662Finally we move our first real time task into this resource group. We 710Finally we move our first real time task into this resource group. We
663also use taskset(1) to ensure the task always runs on a dedicated CPU 711also use taskset(1) to ensure the task always runs on a dedicated CPU
664on socket 0. Most uses of resource groups will also constrain which 712on socket 0. Most uses of resource groups will also constrain which
665processors tasks run on. 713processors tasks run on.
714::
666 715
667# echo 1234 > p0/tasks 716 # echo 1234 > p0/tasks
668# taskset -cp 1 1234 717 # taskset -cp 1 1234
669 718
670Ditto for the second real time task (with the remaining 25% of cache): 719Ditto for the second real time task (with the remaining 25% of cache)::
671 720
672# mkdir p1 721 # mkdir p1
673# echo "L3:0=7c00;1=fffff" > p1/schemata 722 # echo "L3:0=7c00;1=fffff" > p1/schemata
674# echo 5678 > p1/tasks 723 # echo 5678 > p1/tasks
675# taskset -cp 2 5678 724 # taskset -cp 2 5678
676 725
677For the same 2 socket system with memory b/w resource and CAT L3 the 726For the same 2 socket system with memory b/w resource and CAT L3 the
678schemata would look like(Assume min_bandwidth 10 and bandwidth_gran is 727schemata would look like(Assume min_bandwidth 10 and bandwidth_gran is
67910): 72810):
680 729
681For our first real time task this would request 20% memory b/w on socket 730For our first real time task this would request 20% memory b/w on socket 0.
6820. 731::
683 732
684# echo -e "L3:0=f8000;1=fffff\nMB:0=20;1=100" > p0/schemata 733 # echo -e "L3:0=f8000;1=fffff\nMB:0=20;1=100" > p0/schemata
685 734
686For our second real time task this would request an other 20% memory b/w 735For our second real time task this would request an other 20% memory b/w
687on socket 0. 736on socket 0.
737::
688 738
689# echo -e "L3:0=f8000;1=fffff\nMB:0=20;1=100" > p0/schemata 739 # echo -e "L3:0=f8000;1=fffff\nMB:0=20;1=100" > p0/schemata
690 740
691Example 3 7413) Example 3
692---------
693 742
694A single socket system which has real-time tasks running on core 4-7 and 743A single socket system which has real-time tasks running on core 4-7 and
695non real-time workload assigned to core 0-3. The real-time tasks share text 744non real-time workload assigned to core 0-3. The real-time tasks share text
696and data, so a per task association is not required and due to interaction 745and data, so a per task association is not required and due to interaction
697with the kernel it's desired that the kernel on these cores shares L3 with 746with the kernel it's desired that the kernel on these cores shares L3 with
698the tasks. 747the tasks.
748::
699 749
700# mount -t resctrl resctrl /sys/fs/resctrl 750 # mount -t resctrl resctrl /sys/fs/resctrl
701# cd /sys/fs/resctrl 751 # cd /sys/fs/resctrl
702 752
703First we reset the schemata for the default group so that the "upper" 753First we reset the schemata for the default group so that the "upper"
70450% of the L3 cache on socket 0, and 50% of memory bandwidth on socket 0 75450% of the L3 cache on socket 0, and 50% of memory bandwidth on socket 0
705cannot be used by ordinary tasks: 755cannot be used by ordinary tasks::
706 756
707# echo "L3:0=3ff\nMB:0=50" > schemata 757 # echo "L3:0=3ff\nMB:0=50" > schemata
708 758
709Next we make a resource group for our real time cores and give it access 759Next we make a resource group for our real time cores and give it access
710to the "top" 50% of the cache on socket 0 and 50% of memory bandwidth on 760to the "top" 50% of the cache on socket 0 and 50% of memory bandwidth on
711socket 0. 761socket 0.
762::
712 763
713# mkdir p0 764 # mkdir p0
714# echo "L3:0=ffc00\nMB:0=50" > p0/schemata 765 # echo "L3:0=ffc00\nMB:0=50" > p0/schemata
715 766
716Finally we move core 4-7 over to the new group and make sure that the 767Finally we move core 4-7 over to the new group and make sure that the
717kernel and the tasks running there get 50% of the cache. They should 768kernel and the tasks running there get 50% of the cache. They should
718also get 50% of memory bandwidth assuming that the cores 4-7 are SMT 769also get 50% of memory bandwidth assuming that the cores 4-7 are SMT
719siblings and only the real time threads are scheduled on the cores 4-7. 770siblings and only the real time threads are scheduled on the cores 4-7.
771::
720 772
721# echo F0 > p0/cpus 773 # echo F0 > p0/cpus
722 774
723Example 4 7754) Example 4
724---------
725 776
726The resource groups in previous examples were all in the default "shareable" 777The resource groups in previous examples were all in the default "shareable"
727mode allowing sharing of their cache allocations. If one resource group 778mode allowing sharing of their cache allocations. If one resource group
@@ -732,157 +783,168 @@ In this example a new exclusive resource group will be created on a L2 CAT
732system with two L2 cache instances that can be configured with an 8-bit 783system with two L2 cache instances that can be configured with an 8-bit
733capacity bitmask. The new exclusive resource group will be configured to use 784capacity bitmask. The new exclusive resource group will be configured to use
73425% of each cache instance. 78525% of each cache instance.
786::
735 787
736# mount -t resctrl resctrl /sys/fs/resctrl/ 788 # mount -t resctrl resctrl /sys/fs/resctrl/
737# cd /sys/fs/resctrl 789 # cd /sys/fs/resctrl
738 790
739First, we observe that the default group is configured to allocate to all L2 791First, we observe that the default group is configured to allocate to all L2
740cache: 792cache::
741 793
742# cat schemata 794 # cat schemata
743L2:0=ff;1=ff 795 L2:0=ff;1=ff
744 796
745We could attempt to create the new resource group at this point, but it will 797We could attempt to create the new resource group at this point, but it will
746fail because of the overlap with the schemata of the default group: 798fail because of the overlap with the schemata of the default group::
747# mkdir p0 799
748# echo 'L2:0=0x3;1=0x3' > p0/schemata 800 # mkdir p0
749# cat p0/mode 801 # echo 'L2:0=0x3;1=0x3' > p0/schemata
750shareable 802 # cat p0/mode
751# echo exclusive > p0/mode 803 shareable
752-sh: echo: write error: Invalid argument 804 # echo exclusive > p0/mode
753# cat info/last_cmd_status 805 -sh: echo: write error: Invalid argument
754schemata overlaps 806 # cat info/last_cmd_status
807 schemata overlaps
755 808
756To ensure that there is no overlap with another resource group the default 809To ensure that there is no overlap with another resource group the default
757resource group's schemata has to change, making it possible for the new 810resource group's schemata has to change, making it possible for the new
758resource group to become exclusive. 811resource group to become exclusive.
759# echo 'L2:0=0xfc;1=0xfc' > schemata 812::
760# echo exclusive > p0/mode 813
761# grep . p0/* 814 # echo 'L2:0=0xfc;1=0xfc' > schemata
762p0/cpus:0 815 # echo exclusive > p0/mode
763p0/mode:exclusive 816 # grep . p0/*
764p0/schemata:L2:0=03;1=03 817 p0/cpus:0
765p0/size:L2:0=262144;1=262144 818 p0/mode:exclusive
819 p0/schemata:L2:0=03;1=03
820 p0/size:L2:0=262144;1=262144
766 821
767A new resource group will on creation not overlap with an exclusive resource 822A new resource group will on creation not overlap with an exclusive resource
768group: 823group::
769# mkdir p1 824
770# grep . p1/* 825 # mkdir p1
771p1/cpus:0 826 # grep . p1/*
772p1/mode:shareable 827 p1/cpus:0
773p1/schemata:L2:0=fc;1=fc 828 p1/mode:shareable
774p1/size:L2:0=786432;1=786432 829 p1/schemata:L2:0=fc;1=fc
775 830 p1/size:L2:0=786432;1=786432
776The bit_usage will reflect how the cache is used: 831
777# cat info/L2/bit_usage 832The bit_usage will reflect how the cache is used::
7780=SSSSSSEE;1=SSSSSSEE 833
779 834 # cat info/L2/bit_usage
780A resource group cannot be forced to overlap with an exclusive resource group: 835 0=SSSSSSEE;1=SSSSSSEE
781# echo 'L2:0=0x1;1=0x1' > p1/schemata 836
782-sh: echo: write error: Invalid argument 837A resource group cannot be forced to overlap with an exclusive resource group::
783# cat info/last_cmd_status 838
784overlaps with exclusive group 839 # echo 'L2:0=0x1;1=0x1' > p1/schemata
840 -sh: echo: write error: Invalid argument
841 # cat info/last_cmd_status
842 overlaps with exclusive group
785 843
786Example of Cache Pseudo-Locking 844Example of Cache Pseudo-Locking
787------------------------------- 845~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
788Lock portion of L2 cache from cache id 1 using CBM 0x3. Pseudo-locked 846Lock portion of L2 cache from cache id 1 using CBM 0x3. Pseudo-locked
789region is exposed at /dev/pseudo_lock/newlock that can be provided to 847region is exposed at /dev/pseudo_lock/newlock that can be provided to
790application for argument to mmap(). 848application for argument to mmap().
849::
791 850
792# mount -t resctrl resctrl /sys/fs/resctrl/ 851 # mount -t resctrl resctrl /sys/fs/resctrl/
793# cd /sys/fs/resctrl 852 # cd /sys/fs/resctrl
794 853
795Ensure that there are bits available that can be pseudo-locked, since only 854Ensure that there are bits available that can be pseudo-locked, since only
796unused bits can be pseudo-locked the bits to be pseudo-locked needs to be 855unused bits can be pseudo-locked the bits to be pseudo-locked needs to be
797removed from the default resource group's schemata: 856removed from the default resource group's schemata::
798# cat info/L2/bit_usage 857
7990=SSSSSSSS;1=SSSSSSSS 858 # cat info/L2/bit_usage
800# echo 'L2:1=0xfc' > schemata 859 0=SSSSSSSS;1=SSSSSSSS
801# cat info/L2/bit_usage 860 # echo 'L2:1=0xfc' > schemata
8020=SSSSSSSS;1=SSSSSS00 861 # cat info/L2/bit_usage
862 0=SSSSSSSS;1=SSSSSS00
803 863
804Create a new resource group that will be associated with the pseudo-locked 864Create a new resource group that will be associated with the pseudo-locked
805region, indicate that it will be used for a pseudo-locked region, and 865region, indicate that it will be used for a pseudo-locked region, and
806configure the requested pseudo-locked region capacity bitmask: 866configure the requested pseudo-locked region capacity bitmask::
807 867
808# mkdir newlock 868 # mkdir newlock
809# echo pseudo-locksetup > newlock/mode 869 # echo pseudo-locksetup > newlock/mode
810# echo 'L2:1=0x3' > newlock/schemata 870 # echo 'L2:1=0x3' > newlock/schemata
811 871
812On success the resource group's mode will change to pseudo-locked, the 872On success the resource group's mode will change to pseudo-locked, the
813bit_usage will reflect the pseudo-locked region, and the character device 873bit_usage will reflect the pseudo-locked region, and the character device
814exposing the pseudo-locked region will exist: 874exposing the pseudo-locked region will exist::
815 875
816# cat newlock/mode 876 # cat newlock/mode
817pseudo-locked 877 pseudo-locked
818# cat info/L2/bit_usage 878 # cat info/L2/bit_usage
8190=SSSSSSSS;1=SSSSSSPP 879 0=SSSSSSSS;1=SSSSSSPP
820# ls -l /dev/pseudo_lock/newlock 880 # ls -l /dev/pseudo_lock/newlock
821crw------- 1 root root 243, 0 Apr 3 05:01 /dev/pseudo_lock/newlock 881 crw------- 1 root root 243, 0 Apr 3 05:01 /dev/pseudo_lock/newlock
822 882
823/* 883::
824 * Example code to access one page of pseudo-locked cache region 884
825 * from user space. 885 /*
826 */ 886 * Example code to access one page of pseudo-locked cache region
827#define _GNU_SOURCE 887 * from user space.
828#include <fcntl.h> 888 */
829#include <sched.h> 889 #define _GNU_SOURCE
830#include <stdio.h> 890 #include <fcntl.h>
831#include <stdlib.h> 891 #include <sched.h>
832#include <unistd.h> 892 #include <stdio.h>
833#include <sys/mman.h> 893 #include <stdlib.h>
834 894 #include <unistd.h>
835/* 895 #include <sys/mman.h>
836 * It is required that the application runs with affinity to only 896
837 * cores associated with the pseudo-locked region. Here the cpu 897 /*
838 * is hardcoded for convenience of example. 898 * It is required that the application runs with affinity to only
839 */ 899 * cores associated with the pseudo-locked region. Here the cpu
840static int cpuid = 2; 900 * is hardcoded for convenience of example.
841 901 */
842int main(int argc, char *argv[]) 902 static int cpuid = 2;
843{ 903
844 cpu_set_t cpuset; 904 int main(int argc, char *argv[])
845 long page_size; 905 {
846 void *mapping; 906 cpu_set_t cpuset;
847 int dev_fd; 907 long page_size;
848 int ret; 908 void *mapping;
849 909 int dev_fd;
850 page_size = sysconf(_SC_PAGESIZE); 910 int ret;
851 911
852 CPU_ZERO(&cpuset); 912 page_size = sysconf(_SC_PAGESIZE);
853 CPU_SET(cpuid, &cpuset); 913
854 ret = sched_setaffinity(0, sizeof(cpuset), &cpuset); 914 CPU_ZERO(&cpuset);
855 if (ret < 0) { 915 CPU_SET(cpuid, &cpuset);
856 perror("sched_setaffinity"); 916 ret = sched_setaffinity(0, sizeof(cpuset), &cpuset);
857 exit(EXIT_FAILURE); 917 if (ret < 0) {
858 } 918 perror("sched_setaffinity");
859 919 exit(EXIT_FAILURE);
860 dev_fd = open("/dev/pseudo_lock/newlock", O_RDWR); 920 }
861 if (dev_fd < 0) { 921
862 perror("open"); 922 dev_fd = open("/dev/pseudo_lock/newlock", O_RDWR);
863 exit(EXIT_FAILURE); 923 if (dev_fd < 0) {
864 } 924 perror("open");
865 925 exit(EXIT_FAILURE);
866 mapping = mmap(0, page_size, PROT_READ | PROT_WRITE, MAP_SHARED, 926 }
867 dev_fd, 0); 927
868 if (mapping == MAP_FAILED) { 928 mapping = mmap(0, page_size, PROT_READ | PROT_WRITE, MAP_SHARED,
869 perror("mmap"); 929 dev_fd, 0);
870 close(dev_fd); 930 if (mapping == MAP_FAILED) {
871 exit(EXIT_FAILURE); 931 perror("mmap");
872 } 932 close(dev_fd);
873 933 exit(EXIT_FAILURE);
874 /* Application interacts with pseudo-locked memory @mapping */ 934 }
875 935
876 ret = munmap(mapping, page_size); 936 /* Application interacts with pseudo-locked memory @mapping */
877 if (ret < 0) { 937
878 perror("munmap"); 938 ret = munmap(mapping, page_size);
879 close(dev_fd); 939 if (ret < 0) {
880 exit(EXIT_FAILURE); 940 perror("munmap");
881 } 941 close(dev_fd);
882 942 exit(EXIT_FAILURE);
883 close(dev_fd); 943 }
884 exit(EXIT_SUCCESS); 944
885} 945 close(dev_fd);
946 exit(EXIT_SUCCESS);
947 }
886 948
887Locking between applications 949Locking between applications
888---------------------------- 950----------------------------
@@ -921,86 +983,86 @@ Read lock:
921 B) If success read the directory structure. 983 B) If success read the directory structure.
922 C) funlock 984 C) funlock
923 985
924Example with bash: 986Example with bash::
925 987
926# Atomically read directory structure 988 # Atomically read directory structure
927$ flock -s /sys/fs/resctrl/ find /sys/fs/resctrl 989 $ flock -s /sys/fs/resctrl/ find /sys/fs/resctrl
928 990
929# Read directory contents and create new subdirectory 991 # Read directory contents and create new subdirectory
930 992
931$ cat create-dir.sh 993 $ cat create-dir.sh
932find /sys/fs/resctrl/ > output.txt 994 find /sys/fs/resctrl/ > output.txt
933mask = function-of(output.txt) 995 mask = function-of(output.txt)
934mkdir /sys/fs/resctrl/newres/ 996 mkdir /sys/fs/resctrl/newres/
935echo mask > /sys/fs/resctrl/newres/schemata 997 echo mask > /sys/fs/resctrl/newres/schemata
936 998
937$ flock /sys/fs/resctrl/ ./create-dir.sh 999 $ flock /sys/fs/resctrl/ ./create-dir.sh
938 1000
939Example with C: 1001Example with C::
940 1002
941/* 1003 /*
942 * Example code do take advisory locks 1004 * Example code do take advisory locks
943 * before accessing resctrl filesystem 1005 * before accessing resctrl filesystem
944 */ 1006 */
945#include <sys/file.h> 1007 #include <sys/file.h>
946#include <stdlib.h> 1008 #include <stdlib.h>
947 1009
948void resctrl_take_shared_lock(int fd) 1010 void resctrl_take_shared_lock(int fd)
949{ 1011 {
950 int ret; 1012 int ret;
951 1013
952 /* take shared lock on resctrl filesystem */ 1014 /* take shared lock on resctrl filesystem */
953 ret = flock(fd, LOCK_SH); 1015 ret = flock(fd, LOCK_SH);
954 if (ret) { 1016 if (ret) {
955 perror("flock"); 1017 perror("flock");
956 exit(-1); 1018 exit(-1);
957 } 1019 }
958} 1020 }
959 1021
960void resctrl_take_exclusive_lock(int fd) 1022 void resctrl_take_exclusive_lock(int fd)
961{ 1023 {
962 int ret; 1024 int ret;
963 1025
964 /* release lock on resctrl filesystem */ 1026 /* release lock on resctrl filesystem */
965 ret = flock(fd, LOCK_EX); 1027 ret = flock(fd, LOCK_EX);
966 if (ret) { 1028 if (ret) {
967 perror("flock"); 1029 perror("flock");
968 exit(-1); 1030 exit(-1);
969 } 1031 }
970} 1032 }
971 1033
972void resctrl_release_lock(int fd) 1034 void resctrl_release_lock(int fd)
973{ 1035 {
974 int ret; 1036 int ret;
975 1037
976 /* take shared lock on resctrl filesystem */ 1038 /* take shared lock on resctrl filesystem */
977 ret = flock(fd, LOCK_UN); 1039 ret = flock(fd, LOCK_UN);
978 if (ret) { 1040 if (ret) {
979 perror("flock"); 1041 perror("flock");
980 exit(-1); 1042 exit(-1);
981 } 1043 }
982} 1044 }
983 1045
984void main(void) 1046 void main(void)
985{ 1047 {
986 int fd, ret; 1048 int fd, ret;
987 1049
988 fd = open("/sys/fs/resctrl", O_DIRECTORY); 1050 fd = open("/sys/fs/resctrl", O_DIRECTORY);
989 if (fd == -1) { 1051 if (fd == -1) {
990 perror("open"); 1052 perror("open");
991 exit(-1); 1053 exit(-1);
992 } 1054 }
993 resctrl_take_shared_lock(fd); 1055 resctrl_take_shared_lock(fd);
994 /* code to read directory contents */ 1056 /* code to read directory contents */
995 resctrl_release_lock(fd); 1057 resctrl_release_lock(fd);
996 1058
997 resctrl_take_exclusive_lock(fd); 1059 resctrl_take_exclusive_lock(fd);
998 /* code to read and write directory contents */ 1060 /* code to read and write directory contents */
999 resctrl_release_lock(fd); 1061 resctrl_release_lock(fd);
1000} 1062 }
1001 1063
1002Examples for RDT Monitoring along with allocation usage: 1064Examples for RDT Monitoring along with allocation usage
1003 1065=======================================================
1004Reading monitored data 1066Reading monitored data
1005---------------------- 1067----------------------
1006Reading an event file (for ex: mon_data/mon_L3_00/llc_occupancy) would 1068Reading an event file (for ex: mon_data/mon_L3_00/llc_occupancy) would
@@ -1009,17 +1071,17 @@ group or CTRL_MON group.
1009 1071
1010 1072
1011Example 1 (Monitor CTRL_MON group and subset of tasks in CTRL_MON group) 1073Example 1 (Monitor CTRL_MON group and subset of tasks in CTRL_MON group)
1012--------- 1074------------------------------------------------------------------------
1013On a two socket machine (one L3 cache per socket) with just four bits 1075On a two socket machine (one L3 cache per socket) with just four bits
1014for cache bit masks 1076for cache bit masks::
1015 1077
1016# mount -t resctrl resctrl /sys/fs/resctrl 1078 # mount -t resctrl resctrl /sys/fs/resctrl
1017# cd /sys/fs/resctrl 1079 # cd /sys/fs/resctrl
1018# mkdir p0 p1 1080 # mkdir p0 p1
1019# echo "L3:0=3;1=c" > /sys/fs/resctrl/p0/schemata 1081 # echo "L3:0=3;1=c" > /sys/fs/resctrl/p0/schemata
1020# echo "L3:0=3;1=3" > /sys/fs/resctrl/p1/schemata 1082 # echo "L3:0=3;1=3" > /sys/fs/resctrl/p1/schemata
1021# echo 5678 > p1/tasks 1083 # echo 5678 > p1/tasks
1022# echo 5679 > p1/tasks 1084 # echo 5679 > p1/tasks
1023 1085
1024The default resource group is unmodified, so we have access to all parts 1086The default resource group is unmodified, so we have access to all parts
1025of all caches (its schemata file reads "L3:0=f;1=f"). 1087of all caches (its schemata file reads "L3:0=f;1=f").
@@ -1029,47 +1091,51 @@ Tasks that are under the control of group "p0" may only allocate from the
1029Tasks in group "p1" use the "lower" 50% of cache on both sockets. 1091Tasks in group "p1" use the "lower" 50% of cache on both sockets.
1030 1092
1031Create monitor groups and assign a subset of tasks to each monitor group. 1093Create monitor groups and assign a subset of tasks to each monitor group.
1094::
1032 1095
1033# cd /sys/fs/resctrl/p1/mon_groups 1096 # cd /sys/fs/resctrl/p1/mon_groups
1034# mkdir m11 m12 1097 # mkdir m11 m12
1035# echo 5678 > m11/tasks 1098 # echo 5678 > m11/tasks
1036# echo 5679 > m12/tasks 1099 # echo 5679 > m12/tasks
1037 1100
1038fetch data (data shown in bytes) 1101fetch data (data shown in bytes)
1102::
1039 1103
1040# cat m11/mon_data/mon_L3_00/llc_occupancy 1104 # cat m11/mon_data/mon_L3_00/llc_occupancy
104116234000 1105 16234000
1042# cat m11/mon_data/mon_L3_01/llc_occupancy 1106 # cat m11/mon_data/mon_L3_01/llc_occupancy
104314789000 1107 14789000
1044# cat m12/mon_data/mon_L3_00/llc_occupancy 1108 # cat m12/mon_data/mon_L3_00/llc_occupancy
104516789000 1109 16789000
1046 1110
1047The parent ctrl_mon group shows the aggregated data. 1111The parent ctrl_mon group shows the aggregated data.
1112::
1048 1113
1049# cat /sys/fs/resctrl/p1/mon_data/mon_l3_00/llc_occupancy 1114 # cat /sys/fs/resctrl/p1/mon_data/mon_l3_00/llc_occupancy
105031234000 1115 31234000
1051 1116
1052Example 2 (Monitor a task from its creation) 1117Example 2 (Monitor a task from its creation)
1053--------- 1118--------------------------------------------
1054On a two socket machine (one L3 cache per socket) 1119On a two socket machine (one L3 cache per socket)::
1055 1120
1056# mount -t resctrl resctrl /sys/fs/resctrl 1121 # mount -t resctrl resctrl /sys/fs/resctrl
1057# cd /sys/fs/resctrl 1122 # cd /sys/fs/resctrl
1058# mkdir p0 p1 1123 # mkdir p0 p1
1059 1124
1060An RMID is allocated to the group once its created and hence the <cmd> 1125An RMID is allocated to the group once its created and hence the <cmd>
1061below is monitored from its creation. 1126below is monitored from its creation.
1127::
1062 1128
1063# echo $$ > /sys/fs/resctrl/p1/tasks 1129 # echo $$ > /sys/fs/resctrl/p1/tasks
1064# <cmd> 1130 # <cmd>
1065 1131
1066Fetch the data 1132Fetch the data::
1067 1133
1068# cat /sys/fs/resctrl/p1/mon_data/mon_l3_00/llc_occupancy 1134 # cat /sys/fs/resctrl/p1/mon_data/mon_l3_00/llc_occupancy
106931789000 1135 31789000
1070 1136
1071Example 3 (Monitor without CAT support or before creating CAT groups) 1137Example 3 (Monitor without CAT support or before creating CAT groups)
1072--------- 1138---------------------------------------------------------------------
1073 1139
1074Assume a system like HSW has only CQM and no CAT support. In this case 1140Assume a system like HSW has only CQM and no CAT support. In this case
1075the resctrl will still mount but cannot create CTRL_MON directories. 1141the resctrl will still mount but cannot create CTRL_MON directories.
@@ -1078,27 +1144,29 @@ able to monitor all tasks including kernel threads.
1078 1144
1079This can also be used to profile jobs cache size footprint before being 1145This can also be used to profile jobs cache size footprint before being
1080able to allocate them to different allocation groups. 1146able to allocate them to different allocation groups.
1147::
1081 1148
1082# mount -t resctrl resctrl /sys/fs/resctrl 1149 # mount -t resctrl resctrl /sys/fs/resctrl
1083# cd /sys/fs/resctrl 1150 # cd /sys/fs/resctrl
1084# mkdir mon_groups/m01 1151 # mkdir mon_groups/m01
1085# mkdir mon_groups/m02 1152 # mkdir mon_groups/m02
1086 1153
1087# echo 3478 > /sys/fs/resctrl/mon_groups/m01/tasks 1154 # echo 3478 > /sys/fs/resctrl/mon_groups/m01/tasks
1088# echo 2467 > /sys/fs/resctrl/mon_groups/m02/tasks 1155 # echo 2467 > /sys/fs/resctrl/mon_groups/m02/tasks
1089 1156
1090Monitor the groups separately and also get per domain data. From the 1157Monitor the groups separately and also get per domain data. From the
1091below its apparent that the tasks are mostly doing work on 1158below its apparent that the tasks are mostly doing work on
1092domain(socket) 0. 1159domain(socket) 0.
1160::
1093 1161
1094# cat /sys/fs/resctrl/mon_groups/m01/mon_L3_00/llc_occupancy 1162 # cat /sys/fs/resctrl/mon_groups/m01/mon_L3_00/llc_occupancy
109531234000 1163 31234000
1096# cat /sys/fs/resctrl/mon_groups/m01/mon_L3_01/llc_occupancy 1164 # cat /sys/fs/resctrl/mon_groups/m01/mon_L3_01/llc_occupancy
109734555 1165 34555
1098# cat /sys/fs/resctrl/mon_groups/m02/mon_L3_00/llc_occupancy 1166 # cat /sys/fs/resctrl/mon_groups/m02/mon_L3_00/llc_occupancy
109931234000 1167 31234000
1100# cat /sys/fs/resctrl/mon_groups/m02/mon_L3_01/llc_occupancy 1168 # cat /sys/fs/resctrl/mon_groups/m02/mon_L3_01/llc_occupancy
110132789 1169 32789
1102 1170
1103 1171
1104Example 4 (Monitor real time tasks) 1172Example 4 (Monitor real time tasks)
@@ -1107,15 +1175,17 @@ Example 4 (Monitor real time tasks)
1107A single socket system which has real time tasks running on cores 4-7 1175A single socket system which has real time tasks running on cores 4-7
1108and non real time tasks on other cpus. We want to monitor the cache 1176and non real time tasks on other cpus. We want to monitor the cache
1109occupancy of the real time threads on these cores. 1177occupancy of the real time threads on these cores.
1178::
1179
1180 # mount -t resctrl resctrl /sys/fs/resctrl
1181 # cd /sys/fs/resctrl
1182 # mkdir p1
1110 1183
1111# mount -t resctrl resctrl /sys/fs/resctrl 1184Move the cpus 4-7 over to p1::
1112# cd /sys/fs/resctrl
1113# mkdir p1
1114 1185
1115Move the cpus 4-7 over to p1 1186 # echo f0 > p1/cpus
1116# echo f0 > p1/cpus
1117 1187
1118View the llc occupancy snapshot 1188View the llc occupancy snapshot::
1119 1189
1120# cat /sys/fs/resctrl/p1/mon_data/mon_L3_00/llc_occupancy 1190 # cat /sys/fs/resctrl/p1/mon_data/mon_L3_00/llc_occupancy
112111234000 1191 11234000