aboutsummaryrefslogtreecommitdiffstats
path: root/include/linux
Commit message (Collapse)AuthorAge
* [PATCH] kdump: Retrieve saved max pfnVivek Goyal2005-06-25
| | | | | | | | | | | This patch retrieves the max_pfn being used by previous kernel and stores it in a safe location (saved_max_pfn) before it is overwritten due to user defined memory map. This pfn is used to make sure that user does not try to read the physical memory beyond saved_max_pfn. Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] kexec: s390 supportHeiko Carstens2005-06-25
| | | | | | | | | | | | | | | | Add kexec support for s390 architecture. From: Milton Miller <miltonm@bga.com> - Fix passing of first argument to relocate_kernel assembly. - Fix Kconfig description. - Remove wrong comment and comments that describe obvious things. - Allow only KEXEC_TYPE_DEFAULT as image type -> dump not supported. Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] kexec: add kexec syscallsEric W. Biederman2005-06-25
| | | | | | | | | | | | | | | | | | | | This patch introduces the architecture independent implementation the sys_kexec_load, the compat_sys_kexec_load system calls. Kexec on panic support has been integrated into the core patch and is relatively clean. In addition the hopefully architecture independent option crashkernel=size@location has been docuemented. It's purpose is to reserve space for the panic kernel to live, and where no DMA transfer will ever be setup to access. Signed-off-by: Eric Biederman <ebiederm@xmission.com> Signed-off-by: Alexander Nyberg <alexn@telia.com> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] sched: voluntary kernel preemptionIngo Molnar2005-06-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a new preemption model: 'Voluntary Kernel Preemption'. The 3 models can be selected from a new menu: (X) No Forced Preemption (Server) ( ) Voluntary Kernel Preemption (Desktop) ( ) Preemptible Kernel (Low-Latency Desktop) we still default to the stock (Server) preemption model. Voluntary preemption works by adding a cond_resched() (reschedule-if-needed) call to every might_sleep() check. It is lighter than CONFIG_PREEMPT - at the cost of not having as tight latencies. It represents a different latency/complexity/overhead tradeoff. It has no runtime impact at all if disabled. Here are size stats that show how the various preemption models impact the kernel's size: text data bss dec hex filename 3618774 547184 179896 4345854 424ffe vmlinux.stock 3626406 547184 179896 4353486 426dce vmlinux.voluntary +0.2% 3748414 548640 179896 4476950 445016 vmlinux.preempt +3.5% voluntary-preempt is +0.2% of .text, preempt is +3.5%. This feature has been tested for many months by lots of people (and it's also included in the RHEL4 distribution and earlier variants were in Fedora as well), and it's intended for users and distributions who dont want to use full-blown CONFIG_PREEMPT for one reason or another. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Dynamic sched domains: sched changesDinakar Guniguntala2005-06-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following patches add dynamic sched domains functionality that was extensively discussed on lkml and lse-tech. I would like to see this added to -mm o The main advantage with this feature is that it ensures that the scheduler load balacing code only balances against the cpus that are in the sched domain as defined by an exclusive cpuset and not all of the cpus in the system. This removes any overhead due to load balancing code trying to pull tasks outside of the cpu exclusive cpuset only to be prevented by the tasks' cpus_allowed mask. o cpu exclusive cpusets are useful for servers running orthogonal workloads such as RT applications requiring low latency and HPC applications that are throughput sensitive o It provides a new API partition_sched_domains in sched.c that makes dynamic sched domains possible. o cpu_exclusive cpusets sets are now associated with a sched domain. Which means that the users can dynamically modify the sched domains through the cpuset file system interface o ia64 sched domain code has been updated to support this feature as well o Currently, this does not support hotplug. (However some of my tests indicate hotplug+preempt is currently broken) o I have tested it extensively on x86. o This should have very minimal impact on performance as none of the fast paths are affected Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Acked-by: Paul Jackson <pj@sgi.com> Acked-by: Nick Piggin <nickpiggin@yahoo.com.au> Acked-by: Matthew Dobson <colpatch@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] sched: consolidate sbe sbfNick Piggin2005-06-25
| | | | | | | | | | | | | | | | Consolidate balance-on-exec with balance-on-fork. This is made easy by the sched-domains RCU patches. As well as the general goodness of code reduction, this allows the runqueues to be unlocked during balance-on-fork. schedstats is a problem. Maybe just have balance-on-event instead of distinguishing fork and exec? Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] sched: cleanup context switch lockingNick Piggin2005-06-25
| | | | | | | | | | | | | | | | Instead of requiring architecture code to interact with the scheduler's locking implementation, provide a couple of defines that can be used by the architecture to request runqueue unlocked context switches, and ask for interrupts to be enabled over the context switch. Also replaces the "switch_lock" used by these architectures with an oncpu flag (note, not a potentially slow bitflag). This eliminates one bus locked memory operation when context switching, and simplifies the task_running function. Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] sched: sched tuningNick Piggin2005-06-25
| | | | | | | | Do some basic initial tuning. Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] sched: schedstats update for balance on forkNick Piggin2005-06-25
| | | | | | | | Add SCHEDSTAT statistics for sched-balance-fork. Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] sched: balance on forkNick Piggin2005-06-25
| | | | | | | | | | | | | | | | | | Reimplement the balance on exec balancing to be sched-domains aware. Use this to also do balance on fork balancing. Make x86_64 do balance on fork over the NUMA domain. The problem that the non sched domains aware blancing became apparent on dual core, multi socket opterons. What we want is for the new tasks to be sent to a different socket, but more often than not, we would first load up our sibling core, or fill two cores of a single remote socket before selecting a new one. This gives large improvements to STREAM on such systems. Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] sched: no aggressive idle balancingNick Piggin2005-06-25
| | | | | | | | | | Remove the very aggressive idle stuff that has recently gone into 2.6 - it is going against the direction we are trying to go. Hopefully we can regain performance through other methods. Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] sched: balance timersNick Piggin2005-06-25
| | | | | | | | | | | | | Do CPU load averaging over a number of different intervals. Allow each interval to be chosen by sending a parameter to source_load and target_load. 0 is instantaneous, idx > 0 returns a decaying average with the most recent sample weighted at 2^(idx-1). To a maximum of 3 (could be easily increased). So generally a higher number will result in more conservative balancing. Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] RCU: clean up a few remaining synchronize_kernel() callsPaul E. McKenney2005-06-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 2.6.12-rc6-mm1 has a few remaining synchronize_kernel()s, some (but not all) in comments. This patch changes these synchronize_kernel() calls (and comments) to synchronize_rcu() or synchronize_sched() as follows: - arch/x86_64/kernel/mce.c mce_read(): change to synchronize_sched() to handle races with machine-check exceptions (synchronize_rcu() would not cut it given RCU implementations intended for hardcore realtime use. - drivers/input/serio/i8042.c i8042_stop(): change to synchronize_sched() to handle races with i8042_interrupt() interrupt handler. Again, synchronize_rcu() would not cut it given RCU implementations intended for hardcore realtime use. - include/*/kdebug.h comments: change to synchronize_sched() to handle races with NMIs. As before, synchronize_rcu() would not cut it... - include/linux/list.h comment: change to synchronize_rcu(), since this comment is for list_del_rcu(). - security/keys/key.c unregister_key_type(): change to synchronize_rcu(), since this is interacting with RCU read side. - security/keys/process_keys.c install_session_keyring(): change to synchronize_rcu(), since this is interacting with RCU read side. Signed-off-by: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] properly stop devices before poweroffPavel Machek2005-06-25
| | | | | | | | | Without this patch, Linux provokes emergency disk shutdowns and similar nastiness. It was in SuSE kernels for some time, IIRC. Signed-off-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] suspend/resume SMP supportLi Shaohua2005-06-25
| | | | | | | | | | Using CPU hotplug to support suspend/resume SMP. Both S3 and S4 use disable/enable_nonboot_cpus API. The S4 part is based on Pavel's original S4 SMP patch. Signed-off-by: Li Shaohua<shaohua.li@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] x86_64: Change init sections for CPU hotplug supportAshok Raj2005-06-25
| | | | | | | | | | | | | | | | | | | | | This patch adds __cpuinit and __cpuinitdata sections that need to exist past boot to support cpu hotplug. Caveat: This is done *only* for EM64T CPU Hotplug support, on request from Andi Kleen. Much of the generic hotplug code in kernel, and none of the other archs that support CPU hotplug today, i386, ia64, ppc64, s390 and parisc dont mark sections with __cpuinit, but only mark them as __devinit, and __devinitdata. If someone is motivated to change generic code, we need to make sure all existing hotplug code does not break, on other arch's that dont use __cpuinit, and __cpudevinit. Signed-off-by: Ashok Raj <ashok.raj@intel.com> Acked-by: Andi Kleen <ak@muc.de> Acked-by: Zwane Mwaikambo <zwane@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] make smp_prepare_cpu to a weak functionAshok Raj2005-06-25
| | | | | | | | | | | | | | I really wish smp_prepare_cpu() would disappear eventually. In the interim this is ideally a weak function, so we dont end up changing several places to define this dummy in headers. Today since the dummy declaration is done only in drivers/base/cpu.c but the function is called in kernel/power/smp.c i get undefined reference in my cpu hotplug code for x86_64 under development. Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] I8K: use standard DMI interfaceDmitry Torokhov2005-06-25
| | | | | | | | | | | | I8K: Change to use stock dmi infrastructure instead of homegrown parsing code. The driver now requires box's DMI data to match list of supported models so driver can be safely compiled-in by default without fear of it poking into random SMM BIOS code. DMI checks can be ignored with i8k.ignore_dmi option. Signed-off-by: Dmitry Torokhov <dtor@mail.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] fs/qnx4/*: fix sparse warningsAlexey Dobriyan2005-06-24
| | | | | | | | | | This patch fixes sparse warnings in the qnx4fs (and might even make qnx4fs work on big-endian boxes) Signed-off-by: Alexey Dobriyan <adobriyan@mail.ru> Signed-off-by: Domen Puncer <domen@coderock.org> Signed-off-by: Anders Larsen <al@alarsen.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* Merge rsync://rsync.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds2005-06-24
|\
| * [PKT_SCHED]: Packet classification based on textsearch (ematch)Thomas Graf2005-06-24
| | | | | | | | | | Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
| * [NET]: skb_find_text() - Find a text pattern in skb dataThomas Graf2005-06-24
| | | | | | | | | | | | | | | | | | | | Finds a pattern in the skb data according to the specified textsearch configuration. Use textsearch_next() to retrieve subsequent occurrences of the pattern. Returns the offset to the first occurrence or UINT_MAX if no match was found. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
| * [NET]: Zerocopy sequential reading of skb dataThomas Graf2005-06-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implements sequential reading for both linear and non-linear skb data at zerocopy cost. The data is returned in chunks of arbitary length, therefore random access is not possible. Usage: from := 0 to := 128 state := undef data := undef len := undef consumed := 0 skb_prepare_seq_read(skb, from, to, &state) while (len = skb_seq_read(consumed, &data, &state)) != 0 do /* do something with 'data' of length 'len' */ if abort then /* abort read if we don't wait for * skb_seq_read() to return 0 */ skb_abort_seq_read(&state) return endif /* not necessary to consume all of 'len' */ consumed += len done Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
| * [LIB]: Naive finite state machine based textsearchThomas Graf2005-06-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A finite state machine consists of n states (struct ts_fsm_token) representing the pattern as a finite automation. The data is read sequentially on a octet basis. Every state token specifies the number of recurrences and the type of value accepted which can be either a specific character or ctype based set of characters. The available type of recurrences include 1, (0|1), [0 n], and [1 n]. The algorithm differs between strict/non-strict mode specyfing whether the pattern has to start at the first octect. Strict mode is enabled by default and can be disabled by inserting TS_FSM_HEAD_IGNORE as the first token in the chain. The runtime performance of the algorithm should be around O(n), however while in strict mode the average runtime can be better. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
| * [LIB]: Textsearch infrastructure.Thomas Graf2005-06-23
| | | | | | | | | | | | | | | | | | | | The textsearch infrastructure provides text searching facitilies for both linear and non-linear data. Individual search algorithms are implemented in modules and chosen by the user. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
| * [TCP]: Allow choosing TCP congestion control via sockopt.Stephen Hemminger2005-06-23
| | | | | | | | | | | | | | | | Allow using setsockopt to set TCP congestion control to use on a per socket basis. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * [NET]: Separate two usages of netdev_max_backlog.Stephen Hemminger2005-06-23
| | | | | | | | | | | | | | | | | | | | | | | | Separate out the two uses of netdev_max_backlog. One controls the upper bound on packets processed per softirq, the new name for this is netdev_budget; the other controls the limit on packets queued via netif_rx. Increase the max_backlog default to account for faster processors. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * [NET]: Eliminate netif_rx massive packet drops.Stephen Hemminger2005-06-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Eliminate the throttling behaviour when the netif receive queue fills because it behaves badly when using high speed networks under load. The throttling cause multiple packet drops that cause TCP to go into slow start mode. The same effective patch has been part of BIC TCP and H-TCP as well as part of Web100. The existing code drops 100's of packets when the queue fills; this changes it to individual packet drop-tail. Signed-off-by: Stephen Hemmminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * [NET]: Remove obsolete netif_rx congestion sensing mechanism.Stephen Hemminger2005-06-23
| | | | | | | | | | | | | | | | | | | | | | | | | | Remove the congestion sensing mechanism from netif_rx, and always return either full or empty. Almost no driver checks the return value from netif_rx, and those that do only use it for debug messages. The original design of netif_rx was to do flow control based on the receive queue, but NAPI has supplanted this and no driver uses the feedback. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * [NET]: Remove obsolete fastroute stats.Stephen Hemminger2005-06-23
| | | | | | | | | | | | | | Remove last vestiages of fastroute code that is no longer used. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | [PATCH] make various thing staticAdrian Bunk2005-06-24
| | | | | | | | | | | | | | | | Another rollup of patches which give various symbols static scope Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] xip: reduce code duplicationCarsten Otte2005-06-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch reworks filemap_xip.c with the goal to reduce code duplication from mm/filemap.c. It applies agains 2.6.12-rc6-mm1. Instead of implementing the aio functions, this one implements the synchronous read/write functions only. For readv and writev, the generic fallback is used. For aio, we rely on the application doing the fallback. Since our "synchronous" function does memcpy immediately anyway, there is no performance difference between using the fallbacks or implementing each operation. Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] xip: ext2: execute in placeCarsten Otte2005-06-24
| | | | | | | | | | | | | | | | | | These are the ext2 related parts. Ext2 now uses the xip_* file operations along with the get_xip_page aop when mounted with -o xip. Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] xip: fs/mm: execute in placeCarsten Otte2005-06-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - generic_file* file operations do no longer have a xip/non-xip split - filemap_xip.c implements a new set of fops that require get_xip_page aop to work proper. all new fops are exported GPL-only (don't like to see whatever code use those except GPL modules) - __xip_unmap now uses page_check_address, which is no longer static in rmap.c, and defined in linux/rmap.h - mm/filemap.h is now much more clean, plainly having just Linus' inline funcs moved here from filemap.c - fix includes in filemap_xip to make it build cleanly on i386 Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] xip: bdev: execute in placeCarsten Otte2005-06-24
| | | | | | | | | | | | | | | | | | This is the block device related part. The block device operation direct_access now has a struct block_device as first parameter. Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] modules: add version and srcversion to sysfsMatt Domsch2005-06-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds version and srcversion files to /sys/module/${modulename} containing the version and srcversion fields of the module's modinfo section (if present). /sys/module/e1000 |-- srcversion `-- version This patch differs slightly from the version posted in January, as it now uses the new kstrdup() call in -mm. Why put this in sysfs? a) Tools like DKMS, which deal with changing out individual kernel modules without replacing the whole kernel, can behave smarter if they can tell the version of a given module. The autoinstaller feature, for example, which determines if your system has a "good" version of a driver (i.e. if the one provided by DKMS has a newer verson than that provided by the kernel package installed), and to automatically compile and install a newer version if DKMS has it but your kernel doesn't yet have that version. b) Because sysadmins manually, or with tools like DKMS, can switch out modules on the file system, you can't count on 'modinfo foo.ko', which looks at /lib/modules/${kernelver}/... actually matching what is loaded into the kernel already. Hence asking sysfs for this. c) as the unbind-driver-from-device work takes shape, it will be possible to rebind a driver that's built-in (no .ko to modinfo for the version) to a newly loaded module. sysfs will have the currently-built-in version info, for comparison. d) tech support scripts can then easily grab the version info for what's running presently - a question I get often. There has been renewed interest in this patch on linux-scsi by driver authors. As the idea originated from GregKH, I leave his Signed-off-by: intact, though the implementation is nearly completely new. Compiled and run on x86 and x86_64. From: Matthew Dobson <colpatch@us.ibm.com> build fix From: Thierry Vignaud <tvignaud@mandriva.com> build fix From: Matthew Dobson <colpatch@us.ibm.com> warning fix Signed-off-by: Greg Kroah-Hartman <greg@kroah.com> Signed-off-by: Matt Domsch <Matt_Domsch@dell.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] knfsd: nfsd4 reboot dirname fixNeilBrown2005-06-24
| | | | | | | | | | | | | | | | | | | | | | | | Set the recovery directory via /proc/fs/nfsd/nfs4recoverydir. It may be changed any time, but is used only on startup. Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] knfsd: nfsd4: reboot recoveryNeilBrown2005-06-24
| | | | | | | | | | | | | | | | | | | | | | This patch adds the code to create and remove client subdirectories from the recovery directory, as described in the previous patch comment. Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] knfsd: nfsd4: initialize recovery directoryNeilBrown2005-06-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NFSv4 clients are required to know what state they have on the server so that they can reclaim it on server reboot. However, it is possible for pathalogical combinations of server reboots and network partitions to leave a client in a state where it cannot know whether it has lost its state on the server. For this reason, rfc3530 requires that we store some information about clients to stable storage. So we maintain a directory /var/lib/nfs/v4recovery with a subdirectory for each client with active state. We leave open the possibility of including files underneath each such subdirectory with information about the client, but for now the subdirectories are empty. We create a client subdirectory whenever a client makes its first non-reclaim open_confirm. We remove a client subdirectory whenever either a) its lease expires, or b) the grace period ends without it reclaiming anything. When handling reclaims, we allow the reclaim if and only if the client doing the reclaim has a subdirectory. This patch adds just the code to scan the recovery directory on nfsd startup. Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] knfsd: nfsd4: remove cb_parsedNeilBrown2005-06-24
| | | | | | | | | | | | | | | | | | | | | | | | The cb_parsed field is only used by probe_callback, to determine whether the callback information has been filled in by setclientid. But there is no way that probe_callback() can be called without that having already happened, so that check is superfluous, as is cb_parsed. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] knfsd: nfsd4: rename state list fieldsNeilBrown2005-06-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Trivial renaming patch: I can never remember, while looking at various lists relating the nfsd4 state structures, which are the "heads" and which are items on other lists, or which structures are actually on the various lists. The following convention helps me: given structures foo and bar, with foo containing the head of a list of bars, use "bars" for the name of the head of the list contained in the struct foo, and use "per_foo" for the entries in the struct bars. Already done for struct nfs4_file; go ahead and do it for the other nfsd4 state structures. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] knfsd: nfsd4: make needlessly global code staticNeilBrown2005-06-24
| | | | | | | | | | | | | | | | | | | | | | | | This patch contains the following possible cleanups: - make needlessly global code static Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] knfsd: nfsd4: reboot hashNeilBrown2005-06-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For the purposes of reboot recovery we keep a directory with subdirectories each having a name that is the ascii hex representation of the md5 sum of a client identifier for an active client. This adds the code to calculate that name. We also use it for the purposes of comparing clients, so if someone ever manages to find two client names that are md5 collisions, then we'll return clid_inuse to the second. Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] knfsd: nfsd4: idmap initializationNeilBrown2005-06-24
| | | | | | | | | | | | | | | | | | | | Adopt standard kernel style by defining a no-op function instead of putting ifdef's in the code where the function is called. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] knfsd: nfsd4: clean up state initializationNeilBrown2005-06-24
| | | | | | | | | | | | | | | | | | | | Separate out stuff that needs initialization on startup from stuff that only needs initialization on module init from static data. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] knfsd: nfsd4: rename nfs4_state_initNeilBrown2005-06-24
| | | | | | | | | | | | | | | | | | Somewhat gratuitous rename to simplify following patch. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] knfsd: nfsd4: delegation recoveryNeilBrown2005-06-24
| | | | | | | | | | | | | | | | | | Allow recovery of delegations after reboot. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] nfsd4: reference count struct nfs4_fileNeilBrown2005-06-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a struct kref to each nfs4_file and take a reference to it from each stateid and delegation that refers to it. The atomicity guarantees are overkill given that all this stuff is done under the single nfsd4 state lock, but a) we'd like finer-grained locking some day, and b) this simplifies the cleanup of the structures a bit, something that has previously been a bit complicated and bug-prone. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] nfsd4: rename nfs4_file fieldsNeilBrown2005-06-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Trivial renaming patch: I can never remember, while looking at various lists relating the nfsd4 state structures, which are the "heads" and which are items on other lists, or which structures are actually on the various lists. The following convention helps me: given structures foo and bar, with foo containing the head of a list of bars, use "bars" for the name of the head of the list contained in the struct foo, and use "per_foo" for the entries in the struct bars. Go ahead and do this for struct nfs4_file. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] nfsd4: fix fh_expire_typeNeilBrown2005-06-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We're returning NFS4_FH_NOEXPIRE_WITH_OPEN | NFS4_FH_VOL_RENAME for the fh_expire_type attribute. This is incorrect: 1. The spec actually only allows NOEXPIRE_WITH_OPEN when VOLATILE_ANY is also set. 2. Filehandles for open files can expire, if the file is removed and there is a reboot. 3. Filehandles are only volatile on rename in the nosubtree check case. Unfortunately, there's no way to indicate that we only expire on remove. So our only choice is FH4_VOLATILE_ANY. Although it's redundant, we also set FH4_VOL_RENAME in the subtree check case, since subtreecheck does actually cause problems in practice and it seems possibly useful to give clients some way to distinguish that case. Fix a mispelled #define while we're at it. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>