aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/filesystems
Commit message (Collapse)AuthorAge
* FS-Cache: Add use of /proc and presentation of statisticsDavid Howells2009-04-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | Make FS-Cache create its /proc interface and present various statistical information through it. Also provide the functions for updating this information. These features are enabled by: CONFIG_FSCACHE_PROC CONFIG_FSCACHE_STATS CONFIG_FSCACHE_HISTOGRAM The /proc directory for FS-Cache is also exported so that caching modules can add their own statistics there too. The FS-Cache module is loadable at this point, and the statistics files can be examined by userspace: cat /proc/fs/fscache/stats cat /proc/fs/fscache/histogram Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Steve Dickson <steved@redhat.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Daire Byrne <Daire.Byrne@framestore.com>
* FS-Cache: Add the FS-Cache cache backend API and documentationDavid Howells2009-04-03
| | | | | | | | | | | | | | | | | | | | | Add the API for a generic facility (FS-Cache) by which caches may declare them selves open for business, and may obtain work to be done from network filesystems. The header file is included by: #include <linux/fscache-cache.h> Documentation for the API is also added to: Documentation/filesystems/caching/backend-api.txt This API is not usable without the implementation of the utility functions which will be added in further patches. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Steve Dickson <steved@redhat.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Daire Byrne <Daire.Byrne@framestore.com>
* FS-Cache: Add the FS-Cache netfs API and documentationDavid Howells2009-04-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the API for a generic facility (FS-Cache) by which filesystems (such as AFS or NFS) may call on local caching capabilities without having to know anything about how the cache works, or even if there is a cache: +---------+ | | +--------------+ | NFS |--+ | | | | | +-->| CacheFS | +---------+ | +----------+ | | /dev/hda5 | | | | | +--------------+ +---------+ +-->| | | | | | |--+ | AFS |----->| FS-Cache | | | | |--+ +---------+ +-->| | | | | | | +--------------+ +---------+ | +----------+ | | | | | | +-->| CacheFiles | | ISOFS |--+ | /var/cache | | | +--------------+ +---------+ General documentation and documentation of the netfs specific API are provided in addition to the header files. As this patch stands, it is possible to build a filesystem against the facility and attempt to use it. All that will happen is that all requests will be immediately denied as if no cache is present. Further patches will implement the core of the facility. The facility will transfer requests from networking filesystems to appropriate caches if possible, or else gracefully deny them. If this facility is disabled in the kernel configuration, then all its operations will trivially reduce to nothing during compilation. WHY NOT I_MAPPING? ================== I have added my own API to implement caching rather than using i_mapping to do this for a number of reasons. These have been discussed a lot on the LKML and CacheFS mailing lists, but to summarise the basics: (1) Most filesystems don't do hole reportage. Holes in files are treated as blocks of zeros and can't be distinguished otherwise, making it difficult to distinguish blocks that have been read from the network and cached from those that haven't. (2) The backing inode must be fully populated before being exposed to userspace through the main inode because the VM/VFS goes directly to the backing inode and does not interrogate the front inode's VM ops. Therefore: (a) The backing inode must fit entirely within the cache. (b) All backed files currently open must fit entirely within the cache at the same time. (c) A working set of files in total larger than the cache may not be cached. (d) A file may not grow larger than the available space in the cache. (e) A file that's open and cached, and remotely grows larger than the cache is potentially stuffed. (3) Writes go to the backing filesystem, and can only be transferred to the network when the file is closed. (4) There's no record of what changes have been made, so the whole file must be written back. (5) The pages belong to the backing filesystem, and all metadata associated with that page are relevant only to the backing filesystem, and not anything stacked atop it. OVERVIEW ======== FS-Cache provides (or will provide) the following facilities: (1) Caches can be added / removed at any time, even whilst in use. (2) Adds a facility by which tags can be used to refer to caches, even if they're not available yet. (3) More than one cache can be used at once. Caches can be selected explicitly by use of tags. (4) The netfs is provided with an interface that allows either party to withdraw caching facilities from a file (required for (1)). (5) A netfs may annotate cache objects that belongs to it. This permits the storage of coherency maintenance data. (6) Cache objects will be pinnable and space reservations will be possible. (7) The interface to the netfs returns as few errors as possible, preferring rather to let the netfs remain oblivious. (8) Cookies are used to represent indices, files and other objects to the netfs. The simplest cookie is just a NULL pointer - indicating nothing cached there. (9) The netfs is allowed to propose - dynamically - any index hierarchy it desires, though it must be aware that the index search function is recursive, stack space is limited, and indices can only be children of indices. (10) Indices can be used to group files together to reduce key size and to make group invalidation easier. The use of indices may make lookup quicker, but that's cache dependent. (11) Data I/O is effectively done directly to and from the netfs's pages. The netfs indicates that page A is at index B of the data-file represented by cookie C, and that it should be read or written. The cache backend may or may not start I/O on that page, but if it does, a netfs callback will be invoked to indicate completion. The I/O may be either synchronous or asynchronous. (12) Cookies can be "retired" upon release. At this point FS-Cache will mark them as obsolete and the index hierarchy rooted at that point will get recycled. (13) The netfs provides a "match" function for index searches. In addition to saying whether a match was made or not, this can also specify that an entry should be updated or deleted. FS-Cache maintains a virtual index tree in which all indices, files, objects and pages are kept. Bits of this tree may actually reside in one or more caches. FSDEF | +------------------------------------+ | | NFS AFS | | +--------------------------+ +-----------+ | | | | homedir mirror afs.org redhat.com | | | +------------+ +---------------+ +----------+ | | | | | | 00001 00002 00007 00125 vol00001 vol00002 | | | | | +---+---+ +-----+ +---+ +------+------+ +-----+----+ | | | | | | | | | | | | | PG0 PG1 PG2 PG0 XATTR PG0 PG1 DIRENT DIRENT DIRENT R/W R/O Bak | | PG0 +-------+ | | 00001 00003 | +---+---+ | | | PG0 PG1 PG2 In the example above, two netfs's can be seen to be backed: NFS and AFS. These have different index hierarchies: (*) The NFS primary index will probably contain per-server indices. Each server index is indexed by NFS file handles to get data file objects. Each data file objects can have an array of pages, but may also have further child objects, such as extended attributes and directory entries. Extended attribute objects themselves have page-array contents. (*) The AFS primary index contains per-cell indices. Each cell index contains per-logical-volume indices. Each of volume index contains up to three indices for the read-write, read-only and backup mirrors of those volumes. Each of these contains vnode data file objects, each of which contains an array of pages. The very top index is the FS-Cache master index in which individual netfs's have entries. Any index object may reside in more than one cache, provided it only has index children. Any index with non-index object children will be assumed to only reside in one cache. The FS-Cache overview can be found in: Documentation/filesystems/caching/fscache.txt The netfs API to FS-Cache can be found in: Documentation/filesystems/caching/netfs-api.txt Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Steve Dickson <steved@redhat.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Daire Byrne <Daire.Byrne@framestore.com>
* documentation: update Documentation/filesystem/proc.txt and ↵Shen Feng2009-04-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Documentation/sysctls Now /proc/sys is described in many places and much information is redundant. This patch updates the proc.txt and move the /proc/sys desciption out to the files in Documentation/sysctls. Details are: merge - 2.1 /proc/sys/fs - File system data - 2.11 /proc/sys/fs/mqueue - POSIX message queues filesystem - 2.17 /proc/sys/fs/epoll - Configuration options for the epoll interface with Documentation/sysctls/fs.txt. remove - 2.2 /proc/sys/fs/binfmt_misc - Miscellaneous binary formats since it's not better then the Documentation/binfmt_misc.txt. merge - 2.3 /proc/sys/kernel - general kernel parameters with Documentation/sysctls/kernel.txt remove - 2.5 /proc/sys/dev - Device specific parameters since it's obsolete the sysfs is used now. remove - 2.6 /proc/sys/sunrpc - Remote procedure calls since it's not better then the Documentation/sysctls/sunrpc.txt move - 2.7 /proc/sys/net - Networking stuff - 2.9 Appletalk - 2.10 IPX to newly created Documentation/sysctls/net.txt. remove - 2.8 /proc/sys/net/ipv4 - IPV4 settings since it's not better then the Documentation/networking/ip-sysctl.txt. add - Chapter 3 Per-Process Parameters to descibe /proc/<pid>/xxx parameters. Signed-off-by: Shen Feng <shen@cn.fujitsu.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'for_linus' of ↵Linus Torvalds2009-04-01
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (33 commits) ext4: Regularize mount options ext4: fix locking typo in mballoc which could cause soft lockup hangs ext4: fix typo which causes a memory leak on error path jbd2: Update locking coments ext4: Rename pa_linear to pa_type ext4: add checks of block references for non-extent inodes ext4: Check for an valid i_mode when reading the inode from disk ext4: Use WRITE_SYNC for commits which are caused by fsync() ext4: Add auto_da_alloc mount option ext4: Use struct flex_groups to calculate get_orlov_stats() ext4: Use atomic_t's in struct flex_groups ext4: remove /proc tuning knobs ext4: Add sysfs support ext4: Track lifetime disk writes ext4: Fix discard of inode prealloc space with delayed allocation. ext4: Automatically allocate delay allocated blocks on rename ext4: Automatically allocate delay allocated blocks on close ext4: add EXT4_IOC_ALLOC_DA_BLKS ioctl ext4: Simplify delalloc code by removing mpage_da_writepages() ext4: Save stack space by removing fake buffer heads ...
| * ext4: Regularize mount optionsTheodore Ts'o2009-03-28
| | | | | | | | | | | | | | | | | | | | Add support for using the mount options "barrier" and "nobarrier", and "auto_da_alloc" and "noauto_da_alloc", which is more consistent than "barrier=<0|1>" or "auto_da_alloc=<0|1>". Most other ext3/ext4 mount options use the foo/nofoo naming convention. We allow the old forms of these mount options for backwards compatibility. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: remove /proc tuning knobsTheodore Ts'o2009-03-31
| | | | | | | | | | | | | | Remove tuning knobs in /proc/fs/ext4/<dev/* since they have been replaced by knobs in sysfs at /sys/fs/ext4/<dev>/*. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: Add fine print for the 32000 subdirectory limitTheodore Ts'o2009-02-23
| | | | | | | | | | | | | | | | | | Some poeple are reading the ext4 feature list too literally and create dubious test cases involving very long filenames and 1k blocksize and then complain when they run into an htree-imposed limit. So add fine print to the "fix 32000 subdirectory limit" ext4 feature. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* | Merge branch 'linux-next' of ↵Linus Torvalds2009-04-01
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (88 commits) PCI: fix HT MSI mapping fix PCI: don't enable too much HT MSI mapping x86/PCI: make pci=lastbus=255 work when acpi is on PCI: save and restore PCIe 2.0 registers PCI: update fakephp for bus_id removal PCI: fix kernel oops on bridge removal PCI: fix conflict between SR-IOV and config space sizing powerpc/PCI: include pci.h in powerpc MSI implementation PCI Hotplug: schedule fakephp for feature removal PCI Hotplug: rename legacy_fakephp to fakephp PCI Hotplug: restore fakephp interface with complete reimplementation PCI: Introduce /sys/bus/pci/devices/.../rescan PCI: Introduce /sys/bus/pci/devices/.../remove PCI: Introduce /sys/bus/pci/rescan PCI: Introduce pci_rescan_bus() PCI: do not enable bridges more than once PCI: do not initialize bridges more than once PCI: always scan child buses PCI: pci_scan_slot() returns newly found devices PCI: don't scan existing devices ... Fix trivial append-only conflict in Documentation/feature-removal-schedule.txt
| * | PCI: Introduce /sys/bus/pci/devices/.../removeAlex Chiang2009-03-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds an attribute named "remove" to a PCI device's sysfs directory. Writing a non-zero value to this attribute will remove the PCI device and any children of it. Trent Piepho wrote the original implementation and documentation. Thanks to Vegard Nossum for testing under kmemcheck and finding locking issues with the sysfs interface. Cc: Trent Piepho <xyzzy@speakeasy.org> Tested-by: Vegard Nossum <vegard.nossum@gmail.com> Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* | | mm: page_mkwrite change prototype to match faultNick Piggin2009-04-01
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change the page_mkwrite prototype to take a struct vm_fault, and return VM_FAULT_xxx flags. There should be no functional change. This makes it possible to return much more detailed error information to the VM (and also can provide more information eg. virtual_address to the driver, which might be important in some special cases). This is required for a subsequent fix. And will also make it easier to merge page_mkwrite() with fault() in future. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Chris Mason <chris.mason@oracle.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <joel.becker@oracle.com> Cc: Artem Bityutskiy <dedekind@infradead.org> Cc: Felix Blyakher <felixb@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge branch 'bkl-removal' of git://git.lwn.net/linux-2.6Linus Torvalds2009-03-26
|\ \ | | | | | | | | | | | | | | | | | | | | | * 'bkl-removal' of git://git.lwn.net/linux-2.6: Rationalize fasync return values Move FASYNC bit handling to f_op->fasync() Use f_lock to protect f_flags Rename struct file->f_ep_lock
| * | Move FASYNC bit handling to f_op->fasync()Jonathan Corbet2009-03-16
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Removing the BKL from FASYNC handling ran into the challenge of keeping the setting of the FASYNC bit in filp->f_flags atomic with regard to calls to the underlying fasync() function. Andi Kleen suggested moving the handling of that bit into fasync(); this patch does exactly that. As a result, we have a couple of internal API changes: fasync() must now manage the FASYNC bit, and it will be called without the BKL held. As it happens, every fasync() implementation in the kernel with one exception calls fasync_helper(). So, if we make fasync_helper() set the FASYNC bit, we can avoid making any changes to the other fasync() functions - as long as those functions, themselves, have proper locking. Most fasync() implementations do nothing but call fasync_helper() - which has its own lock - so they are easily verified as correct. The BKL had already been pushed down into the rest. The networking code has its own version of fasync_helper(), so that code has been augmented with explicit FASYNC bit handling. Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: David Miller <davem@davemloft.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
* | trivial: fix orphan dates in ext2 documentationJody McIntyre2009-03-23
| | | | | | | | | | | | | | | | | | Revert the change to the orphan dates of Windows 95, DOS, compression. Add a new orphan date for OS/2. Signed-off-by: Jody McIntyre <scjody@sun.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds2009-03-23
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (32 commits) ucc_geth: Fix oops when using fixed-link support dm9000: locking bugfix net: update dnet.c for bus_id removal dnet: DNET should depend on HAS_IOMEM dca: add missing copyright/license headers nl80211: Check that function pointer != NULL before using it sungem: missing net_device_ops be2net: fix to restore vlan ids into BE2 during a IF DOWN->UP cycle be2net: replenish when posting to rx-queue is starved in out of mem conditions bas_gigaset: correctly allocate USB interrupt transfer buffer smsc911x: reset last known duplex and carrier on open sh_eth: Fix mistake of the address of SH7763 sh_eth: Change handling of IRQ netns: oops in ip[6]_frag_reasm incrementing stats net: kfree(napi->skb) => kfree_skb net: fix sctp breakage ipv6: fix display of local and remote sit endpoints net: Document /proc/sys/net/core/netdev_budget tulip: fix crash on iface up with shirq debug virtio_net: Make virtio_net support carrier detection ...
| * net: Document /proc/sys/net/core/netdev_budgetStanislaw Gruszka2009-03-18
| | | | | | | | | | | | | | | | | | The NAPI poll parameter netdev_budget is not documented in kernel-docs. Since it may have a substantial effect on at least some network loads, it should be. Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | trivial: fix bad links in the ext2 and ext3 documentationJody McIntyre2009-03-12
| | | | | | | | | | | | | | Trivial patch to fix bad links in the ext2 and ext3 documentation. Signed-off-by: Jody McIntyre <scjody@sun.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Squashfs: fix documentation typo, Cramfs filesystem limit is 256 MiBPhillip Lougher2009-03-04
|/ | | | Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
* PATCH [2/2] Documentation/filesystems/sysfs.txt: fix descriptions of device ↵Mike Murphy2009-02-22
| | | | | | | | | | attributes Fix descriptions of device attributes to be consistent with the actual implementations in include/linux/device.h Signed-off-by: Mike Murphy <mamurph[at]cs.clemson.edu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* PCI: return error on failure to read PCI ROMsTimothy S. Nelson2009-02-04
| | | | | | | | | | | | This patch makes the ROM reading code return an error to user space if the size of the ROM read is equal to 0. The patch also emits a warnings if the contents of the ROM are invalid, and documents the effects of the "enable" file on ROM reading. Signed-off-by: Timothy S. Nelson <wayland@wayland.id.au> Acked-by: Alex Villacis-Lasso <a_villacis@palosanto.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* Merge branch 'linux-next' of git://git.infradead.org/ubifs-2.6Linus Torvalds2009-02-03
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'linux-next' of git://git.infradead.org/ubifs-2.6: UBIFS: remove fast unmounting UBIFS: return sensible error codes UBIFS: remount ro fixes UBIFS: spelling fix 'date' -> 'data' UBIFS: sync wbufs after syncing inodes and pages UBIFS: fix LPT out-of-space bug (again) UBIFS: fix no_chk_data_crc UBIFS: fix assertions UBIFS: ensure orphan area head is initialized UBIFS: always clean up GC LEB space UBIFS: add re-mount debugging checks UBIFS: fix LEB list freeing UBIFS: simplify locking UBIFS: document dark_wm and dead_wm better UBIFS: do not treat all data as short term UBIFS: constify operations UBIFS: do not commit twice
| * UBIFS: remove fast unmountingArtem Bityutskiy2009-01-29
| | | | | | | | | | | | | | | | | | | | This UBIFS feature has never worked properly, and it was a mistake to add it because we simply have no use-cases. So, lets still accept the fast_unmount mount option, but ignore it. This does not change much, because UBIFS commit in sync_fs anyway, and sync_fs is called while unmounting. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
* | mm: OOM documentation updateEvgeniy Polyakov2009-01-29
| | | | | | | | | | | | | | | | Signed-off-by: Evgeniy Polyakov <zbr@ioremap.net> Acked-by: David Rientjes <rientjes@google.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | update port number in NFS/RDMA documentationJames Lentini2009-01-27
|/ | | | | | | | Update the NFS/RDMA documentation to use the new port number assigned by IANA. Signed-off-by: James Lentini <jlentini@netapp.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* Update of Documentation: vm.txt and proc.txtPeter W Morreale2009-01-15
| | | | | | | | | | | | | | | Update Documentation/sysctl/vm.txt and Documentation/filesystems/proc.txt. More specifically, the section on /proc/sys/vm in Documentation/filesystems/proc.txt was removed and a link to Documentation/sysctl/vm.txt added. Most of the verbiage from proc.txt was simply moved in vm.txt, with new addtional text for "swappiness" and "stat_interval". Signed-off-by: Peter W Morreale <pmorreale@novell.com> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* filesystem freeze: add error handling of write_super_lockfs/unlockfsTakashi Sato2009-01-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, ext3 in mainline Linux doesn't have the freeze feature which suspends write requests. So, we cannot take a backup which keeps the filesystem's consistency with the storage device's features (snapshot and replication) while it is mounted. In many case, a commercial filesystem (e.g. VxFS) has the freeze feature and it would be used to get the consistent backup. If Linux's standard filesystem ext3 has the freeze feature, we can do it without a commercial filesystem. So I have implemented the ioctls of the freeze feature. I think we can take the consistent backup with the following steps. 1. Freeze the filesystem with the freeze ioctl. 2. Separate the replication volume or create the snapshot with the storage device's feature. 3. Unfreeze the filesystem with the unfreeze ioctl. 4. Take the backup from the separated replication volume or the snapshot. This patch: VFS: Changed the type of write_super_lockfs and unlockfs from "void" to "int" so that they can return an error. Rename write_super_lockfs and unlockfs of the super block operation freeze_fs and unfreeze_fs to avoid a confusion. ext3, ext4, xfs, gfs2, jfs: Changed the type of write_super_lockfs and unlockfs from "void" to "int" so that write_super_lockfs returns an error if needed, and unlockfs always returns 0. reiserfs: Changed the type of write_super_lockfs and unlockfs from "void" to "int" so that they always return 0 (success) to keep a current behavior. Signed-off-by: Takashi Sato <t-sato@yk.jp.nec.com> Signed-off-by: Masayuki Hamaguchi <m-hamaguchi@ys.jp.nec.com> Cc: <xfs-masters@oss.sgi.com> Cc: <linux-ext4@vger.kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Kleikamp <shaggy@austin.ibm.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Alasdair G Kergon <agk@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linusLinus Torvalds2009-01-09
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus: MAINTAINERS: squashfs entry Squashfs: documentation Squashfs: initrd support Squashfs: Kconfig entry Squashfs: Makefiles Squashfs: header files Squashfs: block operations Squashfs: cache operations Squashfs: uid/gid lookup operations Squashfs: fragment block operations Squashfs: export operations Squashfs: super block operations Squashfs: symlink operations Squashfs: regular file operations Squashfs: directory readdir operations Squashfs: directory lookup operations Squashfs: inode operations
| * Squashfs: documentationPhillip Lougher2009-01-05
| | | | | | | | Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstableLinus Torvalds2009-01-09
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (864 commits) Btrfs: explicitly mark the tree log root for writeback Btrfs: Drop the hardware crc32c asm code Btrfs: Add Documentation/filesystem/btrfs.txt, remove old COPYING Btrfs: kmap_atomic(KM_USER0) is safe for btrfs_readpage_end_io_hook Btrfs: Don't use kmap_atomic(..., KM_IRQ0) during checksum verifies Btrfs: tree logging checksum fixes Btrfs: don't change file extent's ram_bytes in btrfs_drop_extents Btrfs: Use btrfs_join_transaction to avoid deadlocks during snapshot creation Btrfs: drop remaining LINUX_KERNEL_VERSION checks and compat code Btrfs: drop EXPORT symbols from extent_io.c Btrfs: Fix checkpatch.pl warnings Btrfs: Fix free block discard calls down to the block layer Btrfs: avoid orphan inode caused by log replay Btrfs: avoid potential super block corruption Btrfs: do not call kfree if kmalloc failed in btrfs_sysfs_add_super Btrfs: fix a memory leak in btrfs_get_sb Btrfs: Fix typo in clear_state_cb Btrfs: Fix memset length in btrfs_file_write Btrfs: update directory's size when creating subvol/snapshot Btrfs: add permission checks to the ioctls ...
| * | Btrfs: Add Documentation/filesystem/btrfs.txt, remove old COPYINGDavid Woodhouse2009-01-07
| | | | | | | | | | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* | | Merge branch 'for_linus' of ↵Linus Torvalds2009-01-08
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (57 commits) jbd2: Fix oops in jbd2_journal_init_inode() on corrupted fs ext4: Remove "extents" mount option block: Add Kconfig help which notes that ext4 needs CONFIG_LBD ext4: Make printk's consistently prefixed with "EXT4-fs: " ext4: Add sanity checks for the superblock before mounting the filesystem ext4: Add mount option to set kjournald's I/O priority jbd2: Submit writes to the journal using WRITE_SYNC jbd2: Add pid and journal device name to the "kjournald2 starting" message ext4: Add markers for better debuggability ext4: Remove code to create the journal inode ext4: provide function to release metadata pages under memory pressure ext3: provide function to release metadata pages under memory pressure add releasepage hooks to block devices which can be used by file systems ext4: Fix s_dirty_blocks_counter if block allocation failed with nodelalloc ext4: Init the complete page while building buddy cache ext4: Don't allow new groups to be added during block allocation ext4: mark the blocks/inode bitmap beyond end of group as used ext4: Use new buffer_head flag to check uninit group bitmaps initialization ext4: Fix the race between read_inode_bitmap() and ext4_new_inode() ext4: code cleanup ...
| * | | ext4: Remove "extents" mount optionTheodore Ts'o2009-01-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This mount option is largely superfluous, and in fact the way it was implemented was buggy; if a filesystem which did not have the extents feature flag was mounted -o extents, the filesystem would attempt to create and use extents-based file even though the extents feature flag was not eabled. The simplest thing to do is to nuke the mount option entirely. It's not all that useful to force the non-creation of new extent-based files if the filesystem can support it. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | | ext4: Add mount option to set kjournald's I/O priorityTheodore Ts'o2009-01-05
| | | | | | | | | | | | | | | | | | | | Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Jens Axboe <jens.axboe@oracle.com>
| * | | ext4: Remove code to create the journal inodeTheodore Ts'o2009-01-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This code has been obsolete in quite some time, since the supported method for adding a journal inode is to use tune2fs (or to creating new filesystem with a journal via mke2fs or mkfs.ext4). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | | ext4: add fsync batch tuning knobsTheodore Ts'o2009-01-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add new mount options, min_batch_time and max_batch_time, which controls how long the jbd2 layer should wait for additional filesystem operations to get batched with a synchronous write transaction. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | | Update Documentation/filesystems/ext4.txtTheodore Ts'o2009-01-06
| | |/ | |/| | | | | | | | | | | | | Fix paragraph with recommendations on how to tune ext4 for benchmarks. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* | | Merge branch 'proc-linus' of ↵Linus Torvalds2009-01-07
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/adobriyan/proc * 'proc-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/adobriyan/proc: proc: remove write-only variable in proc_pident_lookup() proc: fix sparse warning proc: add /proc/*/stack proc: remove '##' usage proc: remove useless WARN_ONs proc: stop using BKL
| * | | proc: add /proc/*/stackKen Chen2009-01-05
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | /proc/*/stack adds the ability to query a task's stack trace. It is more useful than /proc/*/wchan as it provides full stack trace instead of single depth. Example output: $ cat /proc/self/stack [<c010a271>] save_stack_trace_tsk+0x17/0x35 [<c01827b4>] proc_pid_stack+0x4a/0x76 [<c018312d>] proc_single_show+0x4a/0x5e [<c016bdec>] seq_read+0xf3/0x29f [<c015a004>] vfs_read+0x6d/0x91 [<c015a0c1>] sys_read+0x3b/0x60 [<c0102eda>] syscall_call+0x7/0xb [<ffffffff>] 0xffffffff [add save_stack_trace_tsk() on mips, ACK Ralf --adobriyan] Signed-off-by: Ken Chen <kenchen@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
* | | poll: allow f_op->poll to sleepTejun Heo2009-01-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | f_op->poll is the only vfs operation which is not allowed to sleep. It's because poll and select implementation used task state to synchronize against wake ups, which doesn't have to be the case anymore as wait/wake interface can now use custom wake up functions. The non-sleep restriction can be a bit tricky because ->poll is not called from an atomic context and the result of accidentally sleeping in ->poll only shows up as temporary busy looping when the timing is right or rather wrong. This patch converts poll/select to use custom wake up function and use separate triggered variable to synchronize against wake up events. The only added overhead is an extra function call during wake up and negligible. This patch removes the one non-sleep exception from vfs locking rules and is beneficial to userland filesystem implementations like FUSE, 9p or peculiar fs like spufs as it's very difficult for those to implement non-sleeping poll method. While at it, make the following cosmetic changes to make poll.h and select.c checkpatch friendly. * s/type * symbol/type *symbol/ : three places in poll.h * remove blank line before EXPORT_SYMBOL() : two places in select.c Oleg: spotted missing barrier in poll_schedule_timeout() Davide: spotted missing write barrier in pollwake() Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Ron Minnich <rminnich@sandia.gov> Cc: Ingo Molnar <mingo@elte.hu> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Brad Boyer <flar@allandria.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Roland McGrath <roland@redhat.com> Cc: Mauro Carvalho Chehab <mchehab@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | mm: add dirty_background_bytes and dirty_bytes sysctlsDavid Rientjes2009-01-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change introduces two new sysctls to /proc/sys/vm: dirty_background_bytes and dirty_bytes. dirty_background_bytes is the counterpart to dirty_background_ratio and dirty_bytes is the counterpart to dirty_ratio. With growing memory capacities of individual machines, it's no longer sufficient to specify dirty thresholds as a percentage of the amount of dirtyable memory over the entire system. dirty_background_bytes and dirty_bytes specify quantities of memory, in bytes, that represent the dirty limits for the entire system. If either of these values is set, its value represents the amount of dirty memory that is needed to commence either background or direct writeback. When a `bytes' or `ratio' file is written, its counterpart becomes a function of the written value. For example, if dirty_bytes is written to be 8096, 8K of memory is required to commence direct writeback. dirty_ratio is then functionally equivalent to 8K / the amount of dirtyable memory: dirtyable_memory = free pages + mapped pages + file cache dirty_background_bytes = dirty_background_ratio * dirtyable_memory -or- dirty_background_ratio = dirty_background_bytes / dirtyable_memory AND dirty_bytes = dirty_ratio * dirtyable_memory -or- dirty_ratio = dirty_bytes / dirtyable_memory Only one of dirty_background_bytes and dirty_background_ratio may be specified at a time, and only one of dirty_bytes and dirty_ratio may be specified. When one sysctl is written, the other appears as 0 when read. The `bytes' files operate on a page size granularity since dirty limits are compared with ZVC values, which are in page units. Prior to this change, the minimum dirty_ratio was 5 as implemented by get_dirty_limits() although /proc/sys/vm/dirty_ratio would show any user written value between 0 and 100. This restriction is maintained, but dirty_bytes has a lower limit of only one page. Also prior to this change, the dirty_background_ratio could not equal or exceed dirty_ratio. This restriction is maintained in addition to restricting dirty_background_bytes. If either background threshold equals or exceeds that of the dirty threshold, it is implicitly set to half the dirty threshold. Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: David Rientjes <rientjes@google.com> Cc: Andrea Righi <righi.andrea@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | ocfs2: add mount option and Kconfig option for aclTiger Yang2009-01-05
|/ / | | | | | | | | | | | | | | This patch adds the Kconfig option "CONFIG_OCFS2_FS_POSIX_ACL" and mount options "acl" to enable acls in Ocfs2. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
* | Merge branch 'linux-next' of git://git.infradead.org/ubifs-2.6Linus Torvalds2009-01-02
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'linux-next' of git://git.infradead.org/ubifs-2.6: (33 commits) UBIFS: add more useful debugging prints UBIFS: print debugging messages properly UBIFS: fix numerous spelling mistakes UBIFS: allow mounting when short of space UBIFS: fix writing uncompressed files UBIFS: fix checkpatch.pl warnings UBIFS: fix sparse warnings UBIFS: simplify make_free_space UBIFS: do not lie about used blocks UBIFS: restore budg_uncommitted_idx UBIFS: always commit on unmount UBIFS: use ubi_sync UBIFS: always commit in sync_fs UBIFS: fix file-system synchronization UBIFS: fix constants initialization UBIFS: avoid unnecessary calculations UBIFS: re-calculate min_idx_size after the commit UBIFS: use nicer 64-bit math UBIFS: fix available blocks count UBIFS: various comment improvements and fixes ...
| * | UBIFS: fix numerous spelling mistakesArtem Bityutskiy2008-12-31
| | | | | | | | | | | | Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
| * | UBIFS: introduce compression mount optionsArtem Bityutskiy2008-12-03
| |/ | | | | | | | | | | | | | | It is very handy to be able to change default UBIFS compressor via mount options. Introduce -o compr=<name> mount option support. Currently only "none", "lzo" and "zlib" compressors are supported. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
* | Document usage of multiple-instances of devptsSukadev Bhattiprolu2009-01-02
| | | | | | | | | | | | | | | | | | | | Changelog [v2]: - Add note indicating strict isolation is not possible unless all mounts of devpts use the 'newinstance' mount option. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | kill ->dir_notify()Al Viro2008-12-31
| | | | | | | | | | | | | | | | | | | | | | Remove the hopelessly misguided ->dir_notify(). The only instance (cifs) has been broken by design from the very beginning; the objects it creates are never destroyed, keep references to struct file they can outlive, nothing that could possibly evict them exists on close(2) path *and* no locking whatsoever is done to prevent races with close(), should the previous, er, deficiencies someday be dealt with. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | fix f_count description in Documentation/filesystems/files.txtEric Dumazet2008-12-31
| | | | | | | | | | | | | | | | Documentation/filesystems/files.txt was not updated when f_count became an atomic_long_t. atomic_long_inc_not_zero() is now used instead of atomic_inc_not_zero() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | correct wrong function name of d_put in kernel document and source commentZhaolei2008-12-31
| | | | | | | | | | | | | | | | | | | | | | no function named d_put(), it should be dput(). Impact: fix document and comment, no functionality changed Signed-off-by: Zhao Lei <zhaolei@cn.fuijtsu.com> Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | [XFS] Fix merge failuresLachlan McIlroy2008-12-29
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: fs/xfs/linux-2.6/xfs_cred.h fs/xfs/linux-2.6/xfs_globals.h fs/xfs/linux-2.6/xfs_ioctl.c fs/xfs/xfs_vnodeops.h Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
| | \
| | \
| | \
| | \
| | \
| | \
| | \
| | \
| *-------. \ Merge branches 'x86/apic', 'x86/cleanups', 'x86/cpufeature', ↵Ingo Molnar2008-12-23
| |\ \ \ \ \ \ | | |_|_|_|_|/ | |/| | | | | | | | | | | | 'x86/crashdump', 'x86/debug', 'x86/defconfig', 'x86/detect-hyper', 'x86/doc', 'x86/dumpstack', 'x86/early-printk', 'x86/fpu', 'x86/idle', 'x86/io', 'x86/memory-corruption-check', 'x86/microcode', 'x86/mm', 'x86/mtrr', 'x86/nmi-watchdog', 'x86/pat2', 'x86/pci-ioapic-boot-irq-quirks', 'x86/ptrace', 'x86/quirks', 'x86/reboot', 'x86/setup-memory', 'x86/signal', 'x86/sparse-fixes', 'x86/time', 'x86/uv' and 'x86/xen' into x86/core