Linux-2.6.12-rc2v2.6.12-rc2

Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!
author: Linus Torvalds <torvalds@ppc970.osdl.org> 2005-04-16 18:20:36 -0400
committer: Linus Torvalds <torvalds@ppc970.osdl.org> 2005-04-16 18:20:36 -0400
commit: 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 (patch)
tree: 0bba044c4ce775e45a88a51686b5d9f90697ea9d /Documentation/vm/balance
1 files changed, 93 insertions, 0 deletions
diff --git a/Documentation/vm/balance b/Documentation/vm/balance
new file mode 100644
index 000000000000..bd3d31bc4915
--- /dev/null
+++ b/Documentation/vm/balance
@@ -0,0 +1,93 @@
+Started Jan 2000 by Kanoj Sarcar <kanoj@sgi.com>
+Memory balancing is needed for non __GFP_WAIT as well as for non
+__GFP_IO allocations.
+There are two reasons to be requesting non __GFP_WAIT allocations:
+the caller can not sleep (typically intr context), or does not want
+to incur cost overheads of page stealing and possible swap io for
+whatever reasons.
+__GFP_IO allocation requests are made to prevent file system deadlocks.
+In the absence of non sleepable allocation requests, it seems detrimental
+to be doing balancing. Page reclamation can be kicked off lazily, that
+is, only when needed (aka zone free memory is 0), instead of making it
+a proactive process.
+That being said, the kernel should try to fulfill requests for direct
+mapped pages from the direct mapped pool, instead of falling back on
+the dma pool, so as to keep the dma pool filled for dma requests (atomic
+or not). A similar argument applies to highmem and direct mapped pages.
+OTOH, if there is a lot of free dma pages, it is preferable to satisfy
+regular memory requests by allocating one from the dma pool, instead
+of incurring the overhead of regular zone balancing.
+In 2.2, memory balancing/page reclamation would kick off only when the
+_total_ number of free pages fell below 1/64 th of total memory. With the
+right ratio of dma and regular memory, it is quite possible that balancing
+would not be done even when the dma zone was completely empty. 2.2 has
+been running production machines of varying memory sizes, and seems to be
+doing fine even with the presence of this problem. In 2.3, due to
+HIGHMEM, this problem is aggravated.
+In 2.3, zone balancing can be done in one of two ways: depending on the
+zone size (and possibly of the size of lower class zones), we can decide
+at init time how many free pages we should aim for while balancing any
+zone. The good part is, while balancing, we do not need to look at sizes
+of lower class zones, the bad part is, we might do too frequent balancing
+due to ignoring possibly lower usage in the lower class zones. Also,
+with a slight change in the allocation routine, it is possible to reduce
+the memclass() macro to be a simple equality.
+Another possible solution is that we balance only when the free memory
+of a zone _and_ all its lower class zones falls below 1/64th of the
+total memory in the zone and its lower class zones. This fixes the 2.2
+balancing problem, and stays as close to 2.2 behavior as possible. Also,
+the balancing algorithm works the same way on the various architectures,
+which have different numbers and types of zones. If we wanted to get
+fancy, we could assign different weights to free pages in different
+zones in the future.
+Note that if the size of the regular zone is huge compared to dma zone,
+it becomes less significant to consider the free dma pages while
+deciding whether to balance the regular zone. The first solution
+becomes more attractive then.
+The appended patch implements the second solution. It also "fixes" two
+problems: first, kswapd is woken up as in 2.2 on low memory conditions
+for non-sleepable allocations. Second, the HIGHMEM zone is also balanced,
+so as to give a fighting chance for replace_with_highmem() to get a
+HIGHMEM page, as well as to ensure that HIGHMEM allocations do not
+fall back into regular zone. This also makes sure that HIGHMEM pages
+are not leaked (for example, in situations where a HIGHMEM page is in 
+the swapcache but is not being used by anyone)
+kswapd also needs to know about the zones it should balance. kswapd is
+primarily needed in a situation where balancing can not be done, 
+probably because all allocation requests are coming from intr context
+and all process contexts are sleeping. For 2.3, kswapd does not really
+need to balance the highmem zone, since intr context does not request
+highmem pages. kswapd looks at the zone_wake_kswapd field in the zone
+structure to decide whether a zone needs balancing.
+Page stealing from process memory and shm is done if stealing the page would
+alleviate memory pressure on any zone in the page's node that has fallen below
+its watermark.
+pages_min/pages_low/pages_high/low_on_memory/zone_wake_kswapd: These are 
+per-zone fields, used to determine when a zone needs to be balanced. When
+the number of pages falls below pages_min, the hysteric field low_on_memory
+gets set. This stays set till the number of free pages becomes pages_high.
+When low_on_memory is set, page allocation requests will try to free some
+pages in the zone (providing GFP_WAIT is set in the request). Orthogonal
+to this, is the decision to poke kswapd to free some zone pages. That
+decision is not hysteresis based, and is done when the number of free
+pages is below pages_low; in which case zone_wake_kswapd is also set.
+(Good) Ideas that I have heard:
+1. Dynamic experience should influence balancing: number of failed requests
+for a zone can be tracked and fed into the balancing scheme (jalvo@mbay.net)
+2. Implement a replace_with_highmem()-like replace_with_regular() to preserve
+dma pages. (lkd@tantalophile.demon.co.uk)
author	Linus Torvalds <torvalds@ppc970.osdl.org>	2005-04-16 18:20:36 -0400
committer	Linus Torvalds <torvalds@ppc970.osdl.org>	2005-04-16 18:20:36 -0400
commit	1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 (patch)
tree	0bba044c4ce775e45a88a51686b5d9f90697ea9d /Documentation/vm/balance

diff --git a/Documentation/vm/balance b/Documentation/vm/balance new file mode 100644 index 000000000000..bd3d31bc4915 --- /dev/null +++ b/Documentation/vm/balance
@@ -0,0 +1,93 @@
	1	Started Jan 2000 by Kanoj Sarcar <kanoj@sgi.com>
	2
	3	Memory balancing is needed for non __GFP_WAIT as well as for non
	4	__GFP_IO allocations.
	5
	6	There are two reasons to be requesting non __GFP_WAIT allocations:
	7	the caller can not sleep (typically intr context), or does not want
	8	to incur cost overheads of page stealing and possible swap io for
	9	whatever reasons.
	10
	11	__GFP_IO allocation requests are made to prevent file system deadlocks.
	12
	13	In the absence of non sleepable allocation requests, it seems detrimental
	14	to be doing balancing. Page reclamation can be kicked off lazily, that
	15	is, only when needed (aka zone free memory is 0), instead of making it
	16	a proactive process.
	17
	18	That being said, the kernel should try to fulfill requests for direct
	19	mapped pages from the direct mapped pool, instead of falling back on
	20	the dma pool, so as to keep the dma pool filled for dma requests (atomic
	21	or not). A similar argument applies to highmem and direct mapped pages.
	22	OTOH, if there is a lot of free dma pages, it is preferable to satisfy
	23	regular memory requests by allocating one from the dma pool, instead
	24	of incurring the overhead of regular zone balancing.
	25
	26	In 2.2, memory balancing/page reclamation would kick off only when the
	27	_total_ number of free pages fell below 1/64 th of total memory. With the
	28	right ratio of dma and regular memory, it is quite possible that balancing
	29	would not be done even when the dma zone was completely empty. 2.2 has
	30	been running production machines of varying memory sizes, and seems to be
	31	doing fine even with the presence of this problem. In 2.3, due to
	32	HIGHMEM, this problem is aggravated.
	33
	34	In 2.3, zone balancing can be done in one of two ways: depending on the
	35	zone size (and possibly of the size of lower class zones), we can decide
	36	at init time how many free pages we should aim for while balancing any
	37	zone. The good part is, while balancing, we do not need to look at sizes
	38	of lower class zones, the bad part is, we might do too frequent balancing
	39	due to ignoring possibly lower usage in the lower class zones. Also,
	40	with a slight change in the allocation routine, it is possible to reduce
	41	the memclass() macro to be a simple equality.
	42
	43	Another possible solution is that we balance only when the free memory
	44	of a zone _and_ all its lower class zones falls below 1/64th of the
	45	total memory in the zone and its lower class zones. This fixes the 2.2
	46	balancing problem, and stays as close to 2.2 behavior as possible. Also,
	47	the balancing algorithm works the same way on the various architectures,
	48	which have different numbers and types of zones. If we wanted to get
	49	fancy, we could assign different weights to free pages in different
	50	zones in the future.
	51
	52	Note that if the size of the regular zone is huge compared to dma zone,
	53	it becomes less significant to consider the free dma pages while
	54	deciding whether to balance the regular zone. The first solution
	55	becomes more attractive then.
	56
	57	The appended patch implements the second solution. It also "fixes" two
	58	problems: first, kswapd is woken up as in 2.2 on low memory conditions
	59	for non-sleepable allocations. Second, the HIGHMEM zone is also balanced,
	60	so as to give a fighting chance for replace_with_highmem() to get a
	61	HIGHMEM page, as well as to ensure that HIGHMEM allocations do not
	62	fall back into regular zone. This also makes sure that HIGHMEM pages
	63	are not leaked (for example, in situations where a HIGHMEM page is in
	64	the swapcache but is not being used by anyone)
	65
	66	kswapd also needs to know about the zones it should balance. kswapd is
	67	primarily needed in a situation where balancing can not be done,
	68	probably because all allocation requests are coming from intr context
	69	and all process contexts are sleeping. For 2.3, kswapd does not really
	70	need to balance the highmem zone, since intr context does not request
	71	highmem pages. kswapd looks at the zone_wake_kswapd field in the zone
	72	structure to decide whether a zone needs balancing.
	73
	74	Page stealing from process memory and shm is done if stealing the page would
	75	alleviate memory pressure on any zone in the page's node that has fallen below
	76	its watermark.
	77
	78	pages_min/pages_low/pages_high/low_on_memory/zone_wake_kswapd: These are
	79	per-zone fields, used to determine when a zone needs to be balanced. When
	80	the number of pages falls below pages_min, the hysteric field low_on_memory
	81	gets set. This stays set till the number of free pages becomes pages_high.
	82	When low_on_memory is set, page allocation requests will try to free some
	83	pages in the zone (providing GFP_WAIT is set in the request). Orthogonal
	84	to this, is the decision to poke kswapd to free some zone pages. That
	85	decision is not hysteresis based, and is done when the number of free
	86	pages is below pages_low; in which case zone_wake_kswapd is also set.
	87
	88
	89	(Good) Ideas that I have heard:
	90	1. Dynamic experience should influence balancing: number of failed requests
	91	for a zone can be tracked and fed into the balancing scheme (jalvo@mbay.net)
	92	2. Implement a replace_with_highmem()-like replace_with_regular() to preserve
	93	dma pages. (lkd@tantalophile.demon.co.uk)