tracing, documentation: Add a document on the kmem tracepoints

Knowing tracepoints exist is not quite the same as knowing what they should be used for. This patch adds a document giving a basic description of the kmem tracepoints and why they might be useful to a performance analyst. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: Rik van Riel <riel@redhat.com> Reviewed-by: Ingo Molnar <mingo@elte.hu> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Li Ming Chun <macli@brc.ubc.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
author: Mel Gorman <mel@csn.ul.ie> 2009-09-21 20:02:49 -0400
committer: Linus Torvalds <torvalds@linux-foundation.org> 2009-09-22 10:17:34 -0400
commit: 8fbb398f5c78832ee61e0d5ed0793fa8857bd853 (patch)
tree: 889fc12c8eefc642ce9f368e76f9baf2bc3ef2ee /Documentation/trace
parent: bb72222086260695d71afe60fa105649c1ea9463 (diff)
1 files changed, 107 insertions, 0 deletions
diff --git a/Documentation/trace/events-kmem.txt b/Documentation/trace/events-kmem.txt
new file mode 100644
index 000000000000..6ef2a8652e17
--- /dev/null
+++ b/Documentation/trace/events-kmem.txt
@@ -0,0 +1,107 @@
+                        Subsystem Trace Points: kmem
+The tracing system kmem captures events related to object and page allocation
+within the kernel. Broadly speaking there are four major subheadings.
+  o Slab allocation of small objects of unknown type (kmalloc)
+  o Slab allocation of small objects of known type
+  o Page allocation
+  o Per-CPU Allocator Activity
+  o External Fragmentation
+This document will describe what each of the tracepoints are and why they
+might be useful.
+1. Slab allocation of small objects of unknown type
+===================================================
+kmalloc         call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s
+kmalloc_node    call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s node=%d
+kfree           call_site=%lx ptr=%p
+Heavy activity for these events may indicate that a specific cache is
+justified, particularly if kmalloc slab pages are getting significantly
+internal fragmented as a result of the allocation pattern. By correlating
+kmalloc with kfree, it may be possible to identify memory leaks and where
+the allocation sites were.
+2. Slab allocation of small objects of known type
+=================================================
+kmem_cache_alloc        call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s
+kmem_cache_alloc_node   call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s node=%d
+kmem_cache_free         call_site=%lx ptr=%p
+These events are similar in usage to the kmalloc-related events except that
+it is likely easier to pin the event down to a specific cache. At the time
+of writing, no information is available on what slab is being allocated from,
+but the call_site can usually be used to extrapolate that information
+3. Page allocation
+==================
+mm_page_alloc             page=%p pfn=%lu order=%d migratetype=%d gfp_flags=%s
+mm_page_alloc_zone_locked page=%p pfn=%lu order=%u migratetype=%d cpu=%d percpu_refill=%d
+mm_page_free_direct       page=%p pfn=%lu order=%d
+mm_pagevec_free           page=%p pfn=%lu order=%d cold=%d
+These four events deal with page allocation and freeing. mm_page_alloc is
+a simple indicator of page allocator activity. Pages may be allocated from
+the per-CPU allocator (high performance) or the buddy allocator.
+If pages are allocated directly from the buddy allocator, the
+mm_page_alloc_zone_locked event is triggered. This event is important as high
+amounts of activity imply high activity on the zone->lock. Taking this lock
+impairs performance by disabling interrupts, dirtying cache lines between
+CPUs and serialising many CPUs.
+When a page is freed directly by the caller, the mm_page_free_direct event
+is triggered. Significant amounts of activity here could indicate that the
+callers should be batching their activities.
+When pages are freed using a pagevec, the mm_pagevec_free is
+triggered. Broadly speaking, pages are taken off the LRU lock in bulk and
+freed in batch with a pagevec. Significant amounts of activity here could
+indicate that the system is under memory pressure and can also indicate
+contention on the zone->lru_lock.
+4. Per-CPU Allocator Activity
+=============================
+mm_page_alloc_zone_locked       page=%p pfn=%lu order=%u migratetype=%d cpu=%d percpu_refill=%d
+mm_page_pcpu_drain              page=%p pfn=%lu order=%d cpu=%d migratetype=%d
+In front of the page allocator is a per-cpu page allocator. It exists only
+for order-0 pages, reduces contention on the zone->lock and reduces the
+amount of writing on struct page.
+When a per-CPU list is empty or pages of the wrong type are allocated,
+the zone->lock will be taken once and the per-CPU list refilled. The event
+triggered is mm_page_alloc_zone_locked for each page allocated with the
+event indicating whether it is for a percpu_refill or not.
+When the per-CPU list is too full, a number of pages are freed, each one
+which triggers a mm_page_pcpu_drain event.
+The individual nature of the events are so that pages can be tracked
+between allocation and freeing. A number of drain or refill pages that occur
+consecutively imply the zone->lock being taken once. Large amounts of PCP
+refills and drains could imply an imbalance between CPUs where too much work
+is being concentrated in one place. It could also indicate that the per-CPU
+lists should be a larger size. Finally, large amounts of refills on one CPU
+and drains on another could be a factor in causing large amounts of cache
+line bounces due to writes between CPUs and worth investigating if pages
+can be allocated and freed on the same CPU through some algorithm change.
+5. External Fragmentation
+=========================
+mm_page_alloc_extfrag           page=%p pfn=%lu alloc_order=%d fallback_order=%d pageblock_order=%d alloc_migratetype=%d fallback_migratetype=%d fragmenting=%d change_ownership=%d
+External fragmentation affects whether a high-order allocation will be
+successful or not. For some types of hardware, this is important although
+it is avoided where possible. If the system is using huge pages and needs
+to be able to resize the pool over the lifetime of the system, this value
+is important.
+Large numbers of this event implies that memory is fragmenting and
+high-order allocations will start failing at some time in the future. One
+means of reducing the occurange of this event is to increase the size of
+min_free_kbytes in increments of 3*pageblock_size*nr_online_nodes where
+pageblock_size is usually the size of the default hugepage size.
author	Mel Gorman <mel@csn.ul.ie>	2009-09-21 20:02:49 -0400
committer	Linus Torvalds <torvalds@linux-foundation.org>	2009-09-22 10:17:34 -0400
commit	8fbb398f5c78832ee61e0d5ed0793fa8857bd853 (patch)
tree	889fc12c8eefc642ce9f368e76f9baf2bc3ef2ee /Documentation/trace
parent	bb72222086260695d71afe60fa105649c1ea9463 (diff)

diff --git a/Documentation/trace/events-kmem.txt b/Documentation/trace/events-kmem.txt new file mode 100644 index 000000000000..6ef2a8652e17 --- /dev/null +++ b/Documentation/trace/events-kmem.txt
@@ -0,0 +1,107 @@
	1	Subsystem Trace Points: kmem
	2
	3	The tracing system kmem captures events related to object and page allocation
	4	within the kernel. Broadly speaking there are four major subheadings.
	5
	6	o Slab allocation of small objects of unknown type (kmalloc)
	7	o Slab allocation of small objects of known type
	8	o Page allocation
	9	o Per-CPU Allocator Activity
	10	o External Fragmentation
	11
	12	This document will describe what each of the tracepoints are and why they
	13	might be useful.
	14
	15	1. Slab allocation of small objects of unknown type
	16	===================================================
	17	kmalloc call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s
	18	kmalloc_node call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s node=%d
	19	kfree call_site=%lx ptr=%p
	20
	21	Heavy activity for these events may indicate that a specific cache is
	22	justified, particularly if kmalloc slab pages are getting significantly
	23	internal fragmented as a result of the allocation pattern. By correlating
	24	kmalloc with kfree, it may be possible to identify memory leaks and where
	25	the allocation sites were.
	26
	27
	28	2. Slab allocation of small objects of known type
	29	=================================================
	30	kmem_cache_alloc call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s
	31	kmem_cache_alloc_node call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s node=%d
	32	kmem_cache_free call_site=%lx ptr=%p
	33
	34	These events are similar in usage to the kmalloc-related events except that
	35	it is likely easier to pin the event down to a specific cache. At the time
	36	of writing, no information is available on what slab is being allocated from,
	37	but the call_site can usually be used to extrapolate that information
	38
	39	3. Page allocation
	40	==================
	41	mm_page_alloc page=%p pfn=%lu order=%d migratetype=%d gfp_flags=%s
	42	mm_page_alloc_zone_locked page=%p pfn=%lu order=%u migratetype=%d cpu=%d percpu_refill=%d
	43	mm_page_free_direct page=%p pfn=%lu order=%d
	44	mm_pagevec_free page=%p pfn=%lu order=%d cold=%d
	45
	46	These four events deal with page allocation and freeing. mm_page_alloc is
	47	a simple indicator of page allocator activity. Pages may be allocated from
	48	the per-CPU allocator (high performance) or the buddy allocator.
	49
	50	If pages are allocated directly from the buddy allocator, the
	51	mm_page_alloc_zone_locked event is triggered. This event is important as high
	52	amounts of activity imply high activity on the zone->lock. Taking this lock
	53	impairs performance by disabling interrupts, dirtying cache lines between
	54	CPUs and serialising many CPUs.
	55
	56	When a page is freed directly by the caller, the mm_page_free_direct event
	57	is triggered. Significant amounts of activity here could indicate that the
	58	callers should be batching their activities.
	59
	60	When pages are freed using a pagevec, the mm_pagevec_free is
	61	triggered. Broadly speaking, pages are taken off the LRU lock in bulk and
	62	freed in batch with a pagevec. Significant amounts of activity here could
	63	indicate that the system is under memory pressure and can also indicate
	64	contention on the zone->lru_lock.
	65
	66	4. Per-CPU Allocator Activity
	67	=============================
	68	mm_page_alloc_zone_locked page=%p pfn=%lu order=%u migratetype=%d cpu=%d percpu_refill=%d
	69	mm_page_pcpu_drain page=%p pfn=%lu order=%d cpu=%d migratetype=%d
	70
	71	In front of the page allocator is a per-cpu page allocator. It exists only
	72	for order-0 pages, reduces contention on the zone->lock and reduces the
	73	amount of writing on struct page.
	74
	75	When a per-CPU list is empty or pages of the wrong type are allocated,
	76	the zone->lock will be taken once and the per-CPU list refilled. The event
	77	triggered is mm_page_alloc_zone_locked for each page allocated with the
	78	event indicating whether it is for a percpu_refill or not.
	79
	80	When the per-CPU list is too full, a number of pages are freed, each one
	81	which triggers a mm_page_pcpu_drain event.
	82
	83	The individual nature of the events are so that pages can be tracked
	84	between allocation and freeing. A number of drain or refill pages that occur
	85	consecutively imply the zone->lock being taken once. Large amounts of PCP
	86	refills and drains could imply an imbalance between CPUs where too much work
	87	is being concentrated in one place. It could also indicate that the per-CPU
	88	lists should be a larger size. Finally, large amounts of refills on one CPU
	89	and drains on another could be a factor in causing large amounts of cache
	90	line bounces due to writes between CPUs and worth investigating if pages
	91	can be allocated and freed on the same CPU through some algorithm change.
	92
	93	5. External Fragmentation
	94	=========================
	95	mm_page_alloc_extfrag page=%p pfn=%lu alloc_order=%d fallback_order=%d pageblock_order=%d alloc_migratetype=%d fallback_migratetype=%d fragmenting=%d change_ownership=%d
	96
	97	External fragmentation affects whether a high-order allocation will be
	98	successful or not. For some types of hardware, this is important although
	99	it is avoided where possible. If the system is using huge pages and needs
	100	to be able to resize the pool over the lifetime of the system, this value
	101	is important.
	102
	103	Large numbers of this event implies that memory is fragmenting and
	104	high-order allocations will start failing at some time in the future. One
	105	means of reducing the occurange of this event is to increase the size of
	106	min_free_kbytes in increments of 3pageblock_sizenr_online_nodes where
	107	pageblock_size is usually the size of the default hugepage size.