Documentation/vm/balance


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93

Started Jan 2000 by Kanoj Sarcar <kanoj@sgi.com>

Memory balancing is needed for non __GFP_WAIT as well as for non
__GFP_IO allocations.

There are two reasons to be requesting non __GFP_WAIT allocations:
the caller can not sleep (typically intr context), or does not want
to incur cost overheads of page stealing and possible swap io for
whatever reasons.

__GFP_IO allocation requests are made to prevent file system deadlocks.

In the absence of non sleepable allocation requests, it seems detrimental
to be doing balancing. Page reclamation can be kicked off lazily, that
is, only when needed (aka zone free memory is 0), instead of making it
a proactive process.

That being said, the kernel should try to fulfill requests for direct
mapped pages from the direct mapped pool, instead of falling back on
the dma pool, so as to keep the dma pool filled for dma requests (atomic
or not). A similar argument applies to highmem and direct mapped pages.
OTOH, if there is a lot of free dma pages, it is preferable to satisfy
regular memory requests by allocating one from the dma pool, instead
of incurring the overhead of regular zone balancing.

In 2.2, memory balancing/page reclamation would kick off only when the
_total_ number of free pages fell below 1/64 th of total memory. With the
right ratio of dma and regular memory, it is quite possible that balancing
would not be done even when the dma zone was completely empty. 2.2 has
been running production machines of varying memory sizes, and seems to be
doing fine even with the presence of this problem. In 2.3, due to
HIGHMEM, this problem is aggravated.

In 2.3, zone balancing can be done in one of two ways: depending on the
zone size (and possibly of the size of lower class zones), we can decide
at init time how many free pages we should aim for while balancing any
zone. The good part is, while balancing, we do not need to look at sizes
of lower class zones, the bad part is, we might do too frequent balancing
due to ignoring possibly lower usage in the lower class zones. Also,
with a slight change in the allocation routine, it is possible to reduce
the memclass() macro to be a simple equality.

Another possible solution is that we balance only when the free memory
of a zone _and_ all its lower class zones falls below 1/64th of the
total memory in the zone and its lower class zones. This fixes the 2.2
balancing problem, and stays as close to 2.2 behavior as possible. Also,
the balancing algorithm works the same way on the various architectures,
which have different numbers and types of zones. If we wanted to get
fancy, we could assign different weights to free pages in different
zones in the future.

Note that if the size of the regular zone is huge compared to dma zone,
it becomes less significant to consider the free dma pages while
deciding whether to balance the regular zone. The first solution
becomes more attractive then.

The appended patch implements the second solution. It also "fixes" two
problems: first, kswapd is woken up as in 2.2 on low memory conditions
for non-sleepable allocations. Second, the HIGHMEM zone is also balanced,
so as to give a fighting chance for replace_with_highmem() to get a
HIGHMEM page, as well as to ensure that HIGHMEM allocations do not
fall back into regular zone. This also makes sure that HIGHMEM pages
are not leaked (for example, in situations where a HIGHMEM page is in 
the swapcache but is not being used by anyone)

kswapd also needs to know about the zones it should balance. kswapd is
primarily needed in a situation where balancing can not be done, 
probably because all allocation requests are coming from intr context
and all process contexts are sleeping. For 2.3, kswapd does not really
need to balance the highmem zone, since intr context does not request
highmem pages. kswapd looks at the zone_wake_kswapd field in the zone
structure to decide whether a zone needs balancing.

Page stealing from process memory and shm is done if stealing the page would
alleviate memory pressure on any zone in the page's node that has fallen below
its watermark.

watemark[WMARK_MIN/WMARK_LOW/WMARK_HIGH]/low_on_memory/zone_wake_kswapd: These
are per-zone fields, used to determine when a zone needs to be balanced. When
the number of pages falls below watermark[WMARK_MIN], the hysteric field
low_on_memory gets set. This stays set till the number of free pages becomes
watermark[WMARK_HIGH]. When low_on_memory is set, page allocation requests will
try to free some pages in the zone (providing GFP_WAIT is set in the request).
Orthogonal to this, is the decision to poke kswapd to free some zone pages.
That decision is not hysteresis based, and is done when the number of free
pages is below watermark[WMARK_LOW]; in which case zone_wake_kswapd is also set.


(Good) Ideas that I have heard:
1. Dynamic experience should influence balancing: number of failed requests
for a zone can be tracked and fed into the balancing scheme (jalvo@mbay.net)
2. Implement a replace_with_highmem()-like replace_with_regular() to preserve
dma pages. (lkd@tantalophile.demon.co.uk)

	How to Get Your Change Into the Linux Kernel
		or
	Care And Operation Of Your Linus Torvalds


For a person or company who wishes to submit a change to the Linux
kernel, the process can sometimes be daunting if you're not familiar
with "the system."  This text is a collection of suggestions which
can greatly increase the chances of your change being accepted.

If you are submitting a driver, also read Documentation/SubmittingDrivers.


--------------------------------------------
SECTION 1 - CREATING AND SENDING YOUR CHANGE
--------------------------------------------


1) "diff -up"
------------

Use "diff -up" or "diff -uprN" to create patches.

All changes to the Linux kernel occur in the form of patches, as
generated by diff(1).  When creating your patch, make sure to create it
in "unified diff" format, as supplied by the '-u' argument to diff(1).
Also, please use the '-p' argument which shows which C function each
change is in - that makes the resultant diff a lot easier to read.
Patches should be based in the root kernel source directory,
not in any lower subdirectory.

To create a patch for a single file, it is often sufficient to do:

	SRCTREE= linux-2.6
	MYFILE=  drivers/net/mydriver.c

	cd $SRCTREE
	cp $MYFILE $MYFILE.orig
	vi $MYFILE	# make your change
	cd ..
	diff -up $SRCTREE/$MYFILE{.orig,} > /tmp/patch

To create a patch for multiple files, you should unpack a "vanilla",
or unmodified kernel source tree, and generate a diff against your
own source tree.  For example:

	MYSRC= /devel/linux-2.6

	tar xvfz linux-2.6.12.tar.gz
	mv linux-2.6.12 linux-2.6.12-vanilla
	diff -uprN -X linux-2.6.12-vanilla/Documentation/dontdiff \
		linux-2.6.12-vanilla $MYSRC > /tmp/patch

"dontdiff" is a list of files which are generated by the kernel during
the build process, and should be ignored in any diff(1)-generated
patch.  The "dontdiff" file is included in the kernel tree in
2.6.12 and later.  For earlier kernel versions, you can get it
from <http://www.xenotime.net/linux/doc/dontdiff>.

Make sure your patch does not include any extra files which do not
belong in a patch submission.  Make sure to review your patch -after-
generated it with diff(1), to ensure accuracy.

If your changes produce a lot of deltas, you may want to look into
splitting them into individual patches which modify things in
logical stages.  This will facilitate easier reviewing by other
kernel developers, very important if you want your patch accepted.
There are a number of scripts which can aid in this:

Quilt:
http://savannah.nongnu.org/projects/quilt

Randy Dunlap's patch scripts:
http://www.xenotime.net/linux/scripts/patching-scripts-002.tar.gz

Andrew Morton's patch scripts:
http://www.zip.com.au/~akpm/linux/patches/patch-scripts-0.20


2) Describe your changes.

Describe the technical detail of the change(s) your patch includes.

Be as specific as possible.  The WORST descriptions possible include
things like "update driver X", "bug fix for driver X", or "this patch
includes updates for subsystem X.  Please apply."

If your description starts to get long, that's a sign that you probably
need to split up your patch.  See #3, next.


3) Separate your changes.

Separate each logical change into its own patch.

For example, if your changes include both bug fixes and performance
enhancements for a single driver, separate those changes into two
or more patches.  If your changes include an API update, and a new
driver which uses that new API, separate those into two patches.

On the other hand, if you make a single change to numerous files,
group those changes into a single patch.  Thus a single logical change
is contained within a single patch.

If one patch depends on another patch in order for a change to be
complete, that is OK.  Simply note "this patch depends on patch X"
in your patch description.


4) Select e-mail destination.

Look through the MAINTAINERS file and the source code, and determine
if your change applies to a specific subsystem of the kernel, with
an assigned maintainer.  If so, e-mail that person.

If no maintainer is listed, or the maintainer does not respond, send
your patch to the primary Linux kernel developer's mailing list,
linux-kernel@vger.kernel.org.  Most kernel developers monitor this
e-mail list, and can comment on your changes.

Linus Torvalds is the final arbiter of all changes accepted into the
Linux kernel.  His e-mail address is <torvalds@osdl.org>.  He gets
a lot of e-mail, so typically you should do your best to -avoid- sending
him e-mail.

Patches which are bug fixes, are "obvious" changes, or similarly
require little discussion should be sent or CC'd to Linus.  Patches
which require discussion or do not have a clear advantage should
usually be sent first to linux-kernel.  Only after the patch is
discussed should the patch then be submitted to Linus.


5) Select your CC (e-mail carbon copy) list.

Unless you have a reason NOT to do so, CC linux-kernel@vger.kernel.org.

Other kernel developers besides Linus need to be aware of your change,
so that they may comment on it and offer code review and suggestions.
linux-kernel is the primary Linux kernel developer mailing list.
Other mailing lists are available for specific subsystems, such as
USB, framebuffer devices, the VFS, the SCSI subsystem, etc.  See the
MAINTAINERS file for a mailing list that relates specifically to
your change.

If changes affect userland-kernel interfaces, please send
the MAN-PAGES maintainer (as listed in the MAINTAINERS file)
a man-pages patch, or at least a notification of the change,
so that some information makes its way into the manual pages.

Even if the maintainer did not respond in step #4, make sure to ALWAYS
copy the maintainer when you change their code.

For small patches you may want to CC the Trivial Patch Monkey
trivial@rustcorp.com.au set up by Rusty Russell; which collects "trivial"
patches. Trivial patches must qualify for one of the following rules:
 Spelling fixes in documentation
 Spelling fixes which could break grep(1).
 Warning fixes (cluttering with useless warnings is bad)
 Compilation fixes (only if they are actually correct)
 Runtime fixes (only if they actually fix things)
 Removing use of deprecated functions/macros (eg. check_region).
 Contact detail and documentation fixes
 Non-portable code replaced by portable code (even in arch-specific,
 since people copy, as long as it's trivial)
 Any fix by the author/maintainer of the file. (ie. patch monkey
 in re-transmission mode)
URL: <http://www.kernel.org/pub/linux/kernel/people/rusty/trivial/>


6) No MIME, no links, no compression, no attachments.  Just plain text.

Linus and other kernel developers need to be able to read and comment
on the changes you are submitting.  It is important for a kernel
developer to be able to "quote" your changes, using standard e-mail
tools, so that they may comment on specific portions of your code.

For this reason, all patches should be submitting e-mail "inline".
WARNING:  Be wary of your editor's word-wrap corrupting your patch,
if you choose to cut-n-paste your patch.

Do not attach the patch as a MIME attachment, compressed or not.
Many popular e-mail applications will not always transmit a MIME
attachment as plain text, making it impossible to comment on your
code.  A MIME attachment also takes Linus a bit more time to process,
decreasing the likelihood of your MIME-attached change being accepted.

Exception:  If your mailer is mangling patches then someone may ask
you to re-send them using MIME.


7) E-mail size.

When sending patches to Linus, always follow step #6.

Large changes are not appropriate for mailing lists, and some
maintainers.  If your patch, uncompressed, exceeds 40 kB in size,
it is preferred that you store your patch on an Internet-accessible
server, and provide instead a URL (link) pointing to your patch.


8) Name your kernel version.

It is important to note, either in the subject line or in the patch
description, the kernel version to which this patch applies.

If the patch does not apply cleanly to the latest kernel version,
Linus will not apply it.


9) Don't get discouraged.  Re-submit.

After you have submitted your change, be patient and wait.  If Linus
likes your change and applies it, it will appear in the next version
of the kernel that he releases.

However, if your change doesn't appear in the next version of the
kernel, there could be any number of reasons.  It's YOUR job to
narrow down those reasons, correct what was wrong, and submit your
updated change.

It is quite common for Linus to "drop" your patch without comment.
That's the nature of the system.  If he drops your patch, it could be
due to
* Your patch did not apply cleanly to the latest kernel version
* Your patch was not sufficiently discussed on linux-kernel.
* A style issue (see section 2),
* An e-mail formatting issue (re-read this section)
* A technical problem with your change
* He gets tons of e-mail, and yours got lost in the shuffle
* You are being annoying (See Figure 1)

When in doubt, solicit comments on linux-kernel mailing list.


10) Include PATCH in the subject

Due to high e-mail traffic to Linus, and to linux-kernel, it is common
convention to prefix your subject line with [PATCH].  This lets Linus
and other kernel developers more easily distinguish patches from other
e-mail discussions.


11) Sign your work

To improve tracking of who did what, especially with patches that can
percolate to their final resting place in the kernel through several
layers of maintainers, we've introduced a "sign-off" procedure on
patches that are being emailed around.

The sign-off is a simple line at the end of the explanation for the
patch, which certifies that you wrote it or otherwise have the right to
pass it on as a open-source patch.  The rules are pretty simple: if you
can certify the below:

        Developer's Certificate of Origin 1.1

        By making a contribution to this project, I certify that:

        (a) The contribution was created in whole or in part by me and I
            have the right to submit it under the open source license
            indicated in the file; or

        (b) The contribution is based upon previous work that, to the best
            of my knowledge, is covered under an appropriate open source
            license and I have the right under that license to submit that
            work with modifications, whether created in whole or in part
            by me, under the same open source license (unless I am
            permitted to submit under a different license), as indicated
            in the file; or

        (c) The contribution was provided directly to me by some other
            person who certified (a), (b) or (c) and I have not modified
            it.

	(d) I understand and agree that this project and the contribution
	    are public and that a record of the contribution (including all
	    personal information I submit with it, including my sign-off) is
	    maintained indefinitely and may be redistributed consistent with
	    this project or the open source license(s) involved.

then you just add a line saying

	Signed-off-by: Random J Developer <random@developer.example.org>

Some people also put extra tags at the end.  They'll just be ignored for
now, but you can do this to mark internal company procedures or just
point out some special detail about the sign-off. 


12) More references for submitting patches

Andrew Morton, "The perfect patch" (tpp).
  <http://www.zip.com.au/~akpm/linux/patches/stuff/tpp.txt>

Jeff Garzik, "Linux kernel patch submission format."
  <http://linux.yyz.us/patch-format.html>


-----------------------------------
SECTION 2 - HINTS, TIPS, AND TRICKS
-----------------------------------

This section lists many of the common "rules" associated with code
submitted to the kernel.  There are always exceptions... but you must
have a really good reason for doing so.  You could probably call this
section Linus Computer Science 101.


1) Read Documentation/CodingStyle

Nuff said.  If your code deviates too much from this, it is likely
to be rejected without further review, and without comment.


2) #ifdefs are ugly

Code cluttered with ifdefs is difficult to read and maintain.  Don't do
it.  Instead, put your ifdefs in a header, and conditionally define
'static inline' functions, or macros, which are used in the code.
Let the compiler optimize away the "no-op" case.

Simple example, of poor code:

	dev = alloc_etherdev (sizeof(struct funky_private));
	if (!dev)
		return -ENODEV;
	#ifdef CONFIG_NET_FUNKINESS
	init_funky_net(dev);
	#endif

Cleaned-up example:

(in header)
	#ifndef CONFIG_NET_FUNKINESS
	static inline void init_funky_net (struct net_device *d) {}
	#endif

(in the code itself)
	dev = alloc_etherdev (sizeof(struct funky_private));
	if (!dev)
		return -ENODEV;