diff options
-rw-r--r-- | Documentation/trace/events-kmem.txt | 107 |
1 files changed, 107 insertions, 0 deletions
diff --git a/Documentation/trace/events-kmem.txt b/Documentation/trace/events-kmem.txt new file mode 100644 index 000000000000..6ef2a8652e17 --- /dev/null +++ b/Documentation/trace/events-kmem.txt | |||
@@ -0,0 +1,107 @@ | |||
1 | Subsystem Trace Points: kmem | ||
2 | |||
3 | The tracing system kmem captures events related to object and page allocation | ||
4 | within the kernel. Broadly speaking there are four major subheadings. | ||
5 | |||
6 | o Slab allocation of small objects of unknown type (kmalloc) | ||
7 | o Slab allocation of small objects of known type | ||
8 | o Page allocation | ||
9 | o Per-CPU Allocator Activity | ||
10 | o External Fragmentation | ||
11 | |||
12 | This document will describe what each of the tracepoints are and why they | ||
13 | might be useful. | ||
14 | |||
15 | 1. Slab allocation of small objects of unknown type | ||
16 | =================================================== | ||
17 | kmalloc call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s | ||
18 | kmalloc_node call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s node=%d | ||
19 | kfree call_site=%lx ptr=%p | ||
20 | |||
21 | Heavy activity for these events may indicate that a specific cache is | ||
22 | justified, particularly if kmalloc slab pages are getting significantly | ||
23 | internal fragmented as a result of the allocation pattern. By correlating | ||
24 | kmalloc with kfree, it may be possible to identify memory leaks and where | ||
25 | the allocation sites were. | ||
26 | |||
27 | |||
28 | 2. Slab allocation of small objects of known type | ||
29 | ================================================= | ||
30 | kmem_cache_alloc call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s | ||
31 | kmem_cache_alloc_node call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s node=%d | ||
32 | kmem_cache_free call_site=%lx ptr=%p | ||
33 | |||
34 | These events are similar in usage to the kmalloc-related events except that | ||
35 | it is likely easier to pin the event down to a specific cache. At the time | ||
36 | of writing, no information is available on what slab is being allocated from, | ||
37 | but the call_site can usually be used to extrapolate that information | ||
38 | |||
39 | 3. Page allocation | ||
40 | ================== | ||
41 | mm_page_alloc page=%p pfn=%lu order=%d migratetype=%d gfp_flags=%s | ||
42 | mm_page_alloc_zone_locked page=%p pfn=%lu order=%u migratetype=%d cpu=%d percpu_refill=%d | ||
43 | mm_page_free_direct page=%p pfn=%lu order=%d | ||
44 | mm_pagevec_free page=%p pfn=%lu order=%d cold=%d | ||
45 | |||
46 | These four events deal with page allocation and freeing. mm_page_alloc is | ||
47 | a simple indicator of page allocator activity. Pages may be allocated from | ||
48 | the per-CPU allocator (high performance) or the buddy allocator. | ||
49 | |||
50 | If pages are allocated directly from the buddy allocator, the | ||
51 | mm_page_alloc_zone_locked event is triggered. This event is important as high | ||
52 | amounts of activity imply high activity on the zone->lock. Taking this lock | ||
53 | impairs performance by disabling interrupts, dirtying cache lines between | ||
54 | CPUs and serialising many CPUs. | ||
55 | |||
56 | When a page is freed directly by the caller, the mm_page_free_direct event | ||
57 | is triggered. Significant amounts of activity here could indicate that the | ||
58 | callers should be batching their activities. | ||
59 | |||
60 | When pages are freed using a pagevec, the mm_pagevec_free is | ||
61 | triggered. Broadly speaking, pages are taken off the LRU lock in bulk and | ||
62 | freed in batch with a pagevec. Significant amounts of activity here could | ||
63 | indicate that the system is under memory pressure and can also indicate | ||
64 | contention on the zone->lru_lock. | ||
65 | |||
66 | 4. Per-CPU Allocator Activity | ||
67 | ============================= | ||
68 | mm_page_alloc_zone_locked page=%p pfn=%lu order=%u migratetype=%d cpu=%d percpu_refill=%d | ||
69 | mm_page_pcpu_drain page=%p pfn=%lu order=%d cpu=%d migratetype=%d | ||
70 | |||
71 | In front of the page allocator is a per-cpu page allocator. It exists only | ||
72 | for order-0 pages, reduces contention on the zone->lock and reduces the | ||
73 | amount of writing on struct page. | ||
74 | |||
75 | When a per-CPU list is empty or pages of the wrong type are allocated, | ||
76 | the zone->lock will be taken once and the per-CPU list refilled. The event | ||
77 | triggered is mm_page_alloc_zone_locked for each page allocated with the | ||
78 | event indicating whether it is for a percpu_refill or not. | ||
79 | |||
80 | When the per-CPU list is too full, a number of pages are freed, each one | ||
81 | which triggers a mm_page_pcpu_drain event. | ||
82 | |||
83 | The individual nature of the events are so that pages can be tracked | ||
84 | between allocation and freeing. A number of drain or refill pages that occur | ||
85 | consecutively imply the zone->lock being taken once. Large amounts of PCP | ||
86 | refills and drains could imply an imbalance between CPUs where too much work | ||
87 | is being concentrated in one place. It could also indicate that the per-CPU | ||
88 | lists should be a larger size. Finally, large amounts of refills on one CPU | ||
89 | and drains on another could be a factor in causing large amounts of cache | ||
90 | line bounces due to writes between CPUs and worth investigating if pages | ||
91 | can be allocated and freed on the same CPU through some algorithm change. | ||
92 | |||
93 | 5. External Fragmentation | ||
94 | ========================= | ||
95 | mm_page_alloc_extfrag page=%p pfn=%lu alloc_order=%d fallback_order=%d pageblock_order=%d alloc_migratetype=%d fallback_migratetype=%d fragmenting=%d change_ownership=%d | ||
96 | |||
97 | External fragmentation affects whether a high-order allocation will be | ||
98 | successful or not. For some types of hardware, this is important although | ||
99 | it is avoided where possible. If the system is using huge pages and needs | ||
100 | to be able to resize the pool over the lifetime of the system, this value | ||
101 | is important. | ||
102 | |||
103 | Large numbers of this event implies that memory is fragmenting and | ||
104 | high-order allocations will start failing at some time in the future. One | ||
105 | means of reducing the occurange of this event is to increase the size of | ||
106 | min_free_kbytes in increments of 3*pageblock_size*nr_online_nodes where | ||
107 | pageblock_size is usually the size of the default hugepage size. | ||