aboutsummaryrefslogtreecommitdiffstats
path: root/include/linux
Commit message (Collapse)AuthorAge
* NFSv4: Add post-op attributes to NFSv4 write and commit callbacks.Trond Myklebust2005-10-27
| | | | Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFSv4: Add post-op attributes to nfs4_proc_remove()Trond Myklebust2005-10-27
| | | | Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFSv4: Add post-op attributes to nfs4_proc_rename()Trond Myklebust2005-10-27
| | | | Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFSv4: Add post-op attributes to nfs4_proc_link()Trond Myklebust2005-10-27
| | | | | | | Optimise attribute revalidation when hardlinking. Add post-op attributes for the directory and the original inode. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFS: Add optional post-op getattr instruction to the NFSv4 file close.Trond Myklebust2005-10-27
| | | | | | | | "Optional" means that the close call will not fail if the getattr at the end of the compound fails. If it does succeed, try to refresh inode attributes. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFSv4: Add directory post-op attributes to the CREATE operations.Trond Myklebust2005-10-27
| | | | | | | | Since the directory attributes change every time we CREATE a file, we might as well pick up the new directory attributes in the same compound. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFS: Don't let nfs_end_data_update() clobber attribute update informationTrond Myklebust2005-10-27
| | | | | | | | | | | | Since we almost always call nfs_end_data_update() after we called nfs_refresh_inode(), we now end up marking the inode metadata as needing revalidation immediately after having updated it. This patch rearranges things so that we mark the inode as needing revalidation _before_ we call nfs_refresh_inode() on those operations that need it. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFS: Optimise inode attribute cache updatesTrond Myklebust2005-10-27
| | | | | | | Allow nfs_refresh_inode() also to update attributes on the inode if the RPC call was sent after the last call to nfs_update_inode(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFS: Convert cache_change_attribute into a jiffy-based valueTrond Myklebust2005-10-27
| | | | Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFS: Cleanup initialisation of struct nfs_fattrTrond Myklebust2005-10-27
| | | | Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* Merge /home/trondmy/scm/kernel/git/torvalds/linux-2.6Trond Myklebust2005-10-27
|\
| * [SERIAL] new hp diva console portJustin Chen2005-10-24
| | | | | | | | | | | | | | | | Add the new ID 0x132a and configure the new PCI Diva console port. This device supports only 1 single console UART. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| * [SERIAL] support the Exsys EX-4055 4S four-port cardBjorn Helgaas2005-10-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tested by Wolfgang Denk with this device: 00:0f.0 Network controller: PLX Technology, Inc. PCI <-> IOBus Bridge (rev 01) Subsystem: Exsys EX-4055 4S(16C550) RS-232 Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Interrupt: pin A routed to IRQ 10 Region 0: Memory at 80100000 (32-bit, non-prefetchable) [size=128] Region 1: I/O ports at 7080 [size=128] Region 2: I/O ports at 7400 [size=32] 00:0f.0 Class 0280: 10b5:9050 (rev 01) Subsystem: d84d:4055 Results with this patch: Serial: 8250/16550 driver $Revision: 1.90 $ 32 ports, IRQ sharing enabled ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A PCI: Found IRQ 10 for device 0000:00:0f.0 ttyS4 at I/O 0x7400 (irq = 10) is a 16550A ttyS5 at I/O 0x7408 (irq = 10) is a 16550A ttyS6 at I/O 0x7410 (irq = 10) is a 16550A ttyS7 at I/O 0x7418 (irq = 10) is a 16550A Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| * [PATCH] inotify/idr leak fixAndrew Morton2005-10-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix a bug which was reported and diagnosed by Stefan Jones <stefan.jones@churchillrandoms.co.uk> IDR trees include a cache of idr_layer objects. There's no way to destroy this cache, so when we discard an overall idr tree we end up leaking some memory. Add and use idr_destroy() for this. v9fs and infiniband also need to use idr_destroy() to avoid leaks. Or, we make the cache global, like radix_tree_preload(). Which is probably better. Later. Cc: Eric Van Hensbergen <ericvh@ericvh.myip.org> Cc: Roland Dreier <rolandd@cisco.com> Cc: Robert Love <rml@novell.com> Cc: John McCutchan <ttb@tentacle.dhs.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
| * [PATCH] Fix handling spurious page fault for hugetlb regionHugh Dickins2005-10-20
| | | | | | | | | | | | | | | | | | | | This reverts commit 3359b54c8c07338f3a863d1109b42eebccdcf379 and replaces it with a cleaner version that is purely based on page table operations, so that the synchronization between inode size and hugetlb mappings becomes moot. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | Merge /home/trondmy/scm/kernel/git/torvalds/linux-2.6Trond Myklebust2005-10-20
|\|
| * [PATCH] swiotlb: make sure initial DMA allocations really are in DMA memoryYasunori Goto2005-10-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This introduces a limit parameter to the core bootmem allocator; The new parameter indicates that physical memory allocated by the bootmem allocator should be within the requested limit. We also introduce alloc_bootmem_low_pages_limit, alloc_bootmem_node_limit, alloc_bootmem_low_pages_node_limit apis, but alloc_bootmem_low_pages_limit is the only api used for swiotlb. The existing alloc_bootmem_low_pages() api could instead have been changed and made to pass right limit to the core allocator. But that would make the patch more intrusive for 2.6.14, as other arches use alloc_bootmem_low_pages(). We may be done that post 2.6.14 as a cleanup. With this, swiotlb gets memory within 4G for both x86_64 and ia64 arches. Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Ravikiran G Thirumalai <kiran@scalex86.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
| * [PATCH] Handle spurious page fault for hugetlb regionSeth, Rohit2005-10-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The hugetlb pages are currently pre-faulted. At the time of mmap of hugepages, we populate the new PTEs. It is possible that HW has already cached some of the unused PTEs internally. These stale entries never get a chance to be purged in existing control flow. This patch extends the check in page fault code for hugepages. Check if a faulted address falls with in size for the hugetlb file backing it. We return VM_FAULT_MINOR for these cases (assuming that the arch specific page-faulting code purges the stale entry for the archs that need it). Signed-off-by: Rohit Seth <rohit.seth@intel.com> [ This is apparently arguably an ia64 port bug. But the code won't hurt, and for now it fixes a real problem on some ia64 machines ] Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | RPCSEC_GSS: krb5 cleanupJ. Bruce Fields2005-10-19
| | | | | | | | | | | | | | Remove some senseless wrappers. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | RPCSEC_GSS remove all qop parametersJ. Bruce Fields2005-10-19
| | | | | | | | | | | | | | | | | | Not only are the qop parameters that are passed around throughout the gssapi unused by any currently implemented mechanism, but there appears to be some doubt as to whether they will ever be used. Let's just kill them off for now. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | RPCSEC_GSS: Add support for privacy to krb5 rpcsec_gss mechanism.J. Bruce Fields2005-10-19
| | | | | | | | | | | | | | Add support for privacy to the krb5 rpcsec_gss mechanism. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | RPCSEC_GSS: krb5 pre-privacy cleanupJ. Bruce Fields2005-10-19
| | | | | | | | | | | | | | | | | | | | | | | | | | The code this was originally derived from processed wrap and mic tokens using the same functions. This required some contortions, and more would be required with the addition of xdr_buf's, so it's better to separate out the two code paths. In preparation for adding privacy support, remove the last vestiges of the old wrap token code. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | RPCSEC_GSS: cleanup au_rslack calculationJ. Bruce Fields2005-10-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Various xdr encode routines use au_rslack to guess where the reply argument will end up, so we can set up the xdr_buf to recieve data into the right place for zero copy. Currently we calculate the au_rslack estimate when we check the verifier. Normally this only depends on the verifier size. In the integrity case we add a few bytes to allow for a length and sequence number. It's a bit simpler to calculate only the verifier size when we check the verifier, and delay the full calculation till we unwrap. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | SUNRPC: Provide a callback to allow free pages allocated during xdr encodingJ. Bruce Fields2005-10-19
| | | | | | | | | | | | | | | | | | For privacy, we need to allocate pages to store the encrypted data (passed in pages can't be used without the risk of corrupting data in the page cache). So we need a way to free that memory after the request has been transmitted. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | SUNRPC: Add support for privacy to generic gss-api code.J. Bruce Fields2005-10-19
| | | | | | | | | | | | | | | | | | Add support for privacy to generic gss-api code. This is dead code until we have both a mechanism that supports privacy and code in the client or server that uses it. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | NFSv4: Eliminate nfsv4 open race...Trond Myklebust2005-10-18
| | | | | | | | | | | | | | Make NFSv4 return the fully initialized file pointer with the stateid that it created in the lookup w/intent. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | VFS: Allow the filesystem to return a full file pointer on open intentTrond Myklebust2005-10-18
| | | | | | | | | | | | | | | | | | This is needed by NFSv4 for atomicity reasons: our open command is in fact a lookup+open, so we need to be able to propagate open context information from lookup() into the resulting struct file's private_data field. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | NFSv4: Fix up handling of open_to_lock sequence idsTrond Myklebust2005-10-18
| | | | | | | | Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | NFSv4: Make NFS clean up byte range locks asynchronouslyTrond Myklebust2005-10-18
| | | | | | | | | | | | Currently we fail to do so if the process was signalled. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | NFSv4: Fix a potential CLOSE raceTrond Myklebust2005-10-18
| | | | | | | | | | | | | | | | | | | | Once the state_owner and lock_owner semaphores get removed, it will be possible for other OPEN requests to reopen the same file if they have lower sequence ids than our CLOSE call. This patch ensures that we recheck the file state once nfs_wait_on_sequence() has completed waiting. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | NFSv4: Add functions to order RPC callsTrond Myklebust2005-10-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NFSv4 file state-changing functions such as OPEN, CLOSE, LOCK,... are all labelled with "sequence identifiers" in order to prevent the server from reordering RPC requests, as this could cause its file state to become out of sync with the client. Currently the NFS client code enforces this ordering locally using semaphores to restrict access to structures until the RPC call is done. This, of course, only works with synchronous RPC calls, since the user process must first grab the semaphore. By dropping semaphores, and instead teaching the RPC engine to hold the RPC calls until they are ready to be sent, we can extend this process to work nicely with asynchronous RPC calls too. This patch adds a new list called "rpc_sequence" that defines the order of the RPC calls to be sent. We add one such list for each state_owner. When an RPC call is ready to be sent, it checks if it is top of the rpc_sequence list. If so, it proceeds. If not, it goes back to sleep, and loops until it hits top of the list. Once the RPC call has completed, it can then bump the sequence id counter, and remove itself from the rpc_sequence list, and then wake up the next sleeper. Note that the state_owner sequence ids and lock_owner sequence ids are all indexed to the same rpc_sequence list, so OPEN, LOCK,... requests are all ordered w.r.t. each other. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | RPC: allow call_encode() to delay transmission of an RPC call.Trond Myklebust2005-10-18
| | | | | | | | | | | | | | | | Currently, call_encode will cause the entire RPC call to abort if it returns an error. This is unnecessarily rigid, and gets in the way of attempts to allow the NFSv4 layer to order RPC calls that carry sequence ids. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | Merge /home/trondmy/scm/kernel/git/torvalds/linux-2.6Trond Myklebust2005-10-18
|\|
| * [PATCH] aio: revert lock_kiocb()Zach Brown2005-10-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | lock_kiocb() was introduced to serialize retrying and cancellation. In the process of doing so it tried to sleep waiting for KIF_LOCKED while holding the ctx_lock spinlock. Recent fixes have ensured that multiple concurrent retries won't be attempted for a given iocb. Cancel has other problems and has no significant in-tree users that have been complaining about it. So for the immediate future we'll revert sleeping with the lock held and will address proper cancellation and retry serialization in the future. Signed-off-by: Zach Brown <zach.brown@oracle.com> Acked-by: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
| * [PATCH] rcu: keep rcu callback event counterEric Dumazet2005-10-17
| | | | | | | | | | | | | | | | | | This makes call_rcu() keep track of how many events there are on the RCU list, and cause a reschedule event when the list gets too long. This helps keep RCU event lists down. Signed-off-by: Linus Torvalds <torvalds@osdl.org>
| * [PATCH] list: add missing rcu_dereference on first elementHerbert Xu2005-10-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It seems that all the list_*_rcu primitives are missing a memory barrier on the very first dereference. For example, #define list_for_each_rcu(pos, head) \ for (pos = (head)->next; prefetch(pos->next), pos != (head); \ pos = rcu_dereference(pos->next)) It will go something like: pos = (head)->next prefetch(pos->next) pos != (head) do stuff We're missing a barrier here. pos = rcu_dereference(pos->next) fetch pos->next barrier given by rcu_dereference(pos->next) store pos Without the missing barrier, the pos->next value may turn out to be stale. In fact, if "do stuff" were also dereferencing pos and relying on list_for_each_rcu to provide the barrier then it may also break. So here is a patch to make sure that we have a barrier for the first element in the list. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
| * [PATCH]: highest_possible_processor_id() has to be a macroAl Viro2005-10-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ... otherwise, things like alpha and sparc64 break and break badly. They define cpu_possible_map to something else in smp.h *AFTER* having included cpumask.h. If that puppy is a macro, expansion will happen at the actual caller, when we'd already seen #define cpu_possible_map ... and we will get the right thing used. As an inline helper it will be tokenized before we get to that define and that's it; no matter what we define later, it won't affect anything. We get modules with dependency on cpu_possible_map instead of the right symbol (phys_cpu_present_map in case of sparc64), or outright link errors if they are built-in. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
| * [PATCH] Fix copy-and-paste error in BSD accountingTim Schmielau2005-10-14
| | | | | | | | | | | | | | | | | | | | | | | | Fix copy and paste error in jiffies_to_AHZ conversion which leads to wrong BSD accounting information on alpha and ia64 when CONFIG_BSD_PROCESS_ACCT_V3 is turned on. Also update comment to match reorganised header files. Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
| * [NETFILTER]: Fix OOPSes on machines with discontiguous cpu numbering.David S. Miller2005-10-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Original patch by Harald Welte, with feedback from Herbert Xu and testing by Sébastien Bernard. EBTABLES, ARP tables, and IP/IP6 tables all assume that cpus are numbered linearly. That is not necessarily true. This patch fixes that up by calculating the largest possible cpu number, and allocating enough per-cpu structure space given that. Signed-off-by: David S. Miller <davem@davemloft.net>
| * [NETPOLL]: wrong return for null netpoll_poll_lock()Ben Dooks2005-10-12
| | | | | | | | | | | | | | | | | | | | When netpoll is not being used, the macro that defines the removed routing netpoll_poll_lock defines the return as zero, but the real routine returns a `void *` Signed-off-by: Ben Dooks <ben-linux@fluff.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * [NETFILTER] ctnetlink: allow userspace to change TCP statePablo Neira Ayuso2005-10-11
| | | | | | | | | | | | | | | | | | | | | | This patch adds the ability of changing the state a TCP connection. I know that this must be used with care but it's required to provide a complete conntrack creation via conntrack_netlink. So I'll document this aspect on the upcoming docs. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * [NETFILTER]: Use only 32bit counters for CONNTRACK_ACCTHarald Welte2005-10-11
| | | | | | | | | | | | | | | | | | | | Initially we used 64bit counters for conntrack-based accounting, since we had no event mechanism to tell userspace that our counters are about to overflow. With nfnetlink_conntrack, we now have such a event mechanism and thus can save 16bytes per connection. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * [NETFILTER] ctnetlink: add one nesting level for TCP statePablo Neira Ayuso2005-10-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To keep consistency, the TCP private protocol information is nested attributes under CTA_PROTOINFO_TCP. This way the sequence of attributes to access the TCP state information looks like here below: CTA_PROTOINFO CTA_PROTOINFO_TCP CTA_PROTOINFO_TCP_STATE instead of: CTA_PROTOINFO CTA_PROTOINFO_TCP_STATE Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * [NETFILTER]: Add missing include to ip_conntrack_tuple.hHarald Welte2005-10-10
| | | | | | | | | | | | | | | | Without this #include, __be16 is not defined and userspace programs will break. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * [NETFILTER] nat: remove bogus structure memberHarald Welte2005-10-10
| | | | | | | | | | | | | | | | | | | | | | | | | | When 'rustynat' was merged in 2.6.12, the use of the "helper" pointer of struct ipt_nat_info was obsoleted, but the pointer not removed from the struct. This patch removes the pointer, thereby yet again shrinking struct ip_conntrack. Discovered-by: Rusty Russell <rusty@netfilter.org> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * [NETFILTER] nfnetlink: use highest bit of nfa_type to indicate nested TLVHarald Welte2005-10-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As Henrik Nordstrom pointed out, all our efforts with "split endian" (i.e. host byte order tags, net byte order values) are useless, unless a parser can determine whether an attribute is nested or not. This patch steals the highest bit of nfattr.nfa_type to indicate whether the data payload contains a nested nfattr (1) or not (0). This will break userspace compatibility, but luckily no kernel with nfnetlink was released so far. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * [PATCH] Fix signal sending in usbdevio on async URB completionHarald Welte2005-10-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a process issues an URB from userspace and (starts to) terminate before the URB comes back, we run into the issue described above. This is because the urb saves a pointer to "current" when it is posted to the device, but there's no guarantee that this pointer is still valid afterwards. In fact, there are three separate issues: 1) the pointer to "current" can become invalid, since the task could be completely gone when the URB completion comes back from the device. 2) Even if the saved task pointer is still pointing to a valid task_struct, task_struct->sighand could have gone meanwhile. 3) Even if the process is perfectly fine, permissions may have changed, and we can no longer send it a signal. So what we do instead, is to save the PID and uid's of the process, and introduce a new kill_proc_info_as_uid() function. Signed-off-by: Harald Welte <laforge@gnumonks.org> [ Fixed up types and added symbol exports ] Signed-off-by: Linus Torvalds <torvalds@osdl.org>
| * [PATCH] x86_64: Set up safe page tables during resumeRafael J. Wysocki2005-10-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following patch makes swsusp avoid the possible temporary corruption of page translation tables during resume on x86-64. This is achieved by creating a copy of the relevant page tables that will not be modified by swsusp and can be safely used by it on resume. The problem is that during resume on x86-64 swsusp may temporarily corrupt the page tables used for the direct mapping of RAM. If that happens, a page fault occurs and cannot be handled properly, which leads to the solid hang of the affected system. This leads to the loss of the system's state from before suspend and may result in the loss of data or the corruption of filesystems, so it is a serious issue. Also, it appears to happen quite often (for me, as often as 50% of the time). The problem is related to the fact that (at least) one of the PMD entries used in the direct memory mapping (starting at PAGE_OFFSET) points to a page table the physical address of which is much greater than the physical address of the PMD entry itself. Moreover, unfortunately, the physical address of the page table before suspend (i.e. the one stored in the suspend image) happens to be different to the physical address of the corresponding page table used during resume (i.e. the one that is valid right before swsusp_arch_resume() in arch/x86_64/kernel/suspend_asm.S is executed). Thus while the image is restored, the "offending" PMD entry gets overwritten, so it does not point to the right physical address any more (i.e. there's no page table at the address pointed to by it, because it points to the address the page table has been at during suspend). Consequently, if the PMD entry is used later on, and it _is_ used in the process of copying the image pages, a page fault occurs, but it cannot be handled in the normal way and the system hangs. In principle we can call create_resume_mapping() from swsusp_arch_resume() (ie. from suspend_asm.S), but then the memory allocations in create_resume_mapping(), resume_pud_mapping(), and resume_pmd_mapping() must be made carefully so that we use _only_ NosaveFree pages in them (the other pages are overwritten by the loop in swsusp_arch_resume()). Additionally, we are in atomic context at that time, so we cannot use GFP_KERNEL. Moreover, if one of the allocations fails, we should free all of the allocated pages, so we need to trace them somehow. All of this is done in the appended patch, except that the functions populating the page tables are located in arch/x86_64/kernel/suspend.c rather than in init.c. It may be done in a more elegan way in the future, with the help of some swsusp patches that are in the works now. [AK: move some externs into headers, renamed a function] Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
| * [PATCH] gfp flags annotations - part 1Al Viro2005-10-08
| | | | | | | | | | | | | | | | | | | | | | | | - added typedef unsigned int __nocast gfp_t; - replaced __nocast uses for gfp flags with gfp_t - it gives exactly the same warnings as far as sparse is concerned, doesn't change generated code (from gcc point of view we replaced unsigned int with typedef) and documents what's going on far better. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
| * Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds2005-10-08
| |\