aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAge
* Merge branch 'next' of ↵Linus Torvalds2009-01-09
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx: (22 commits) ioat: fix self test for multi-channel case dmaengine: bump initcall level to arch_initcall dmaengine: advertise all channels on a device to dma_filter_fn dmaengine: use idr for registering dma device numbers dmaengine: add a release for dma class devices and dependent infrastructure ioat: do not perform removal actions at shutdown iop-adma: enable module removal iop-adma: kill debug BUG_ON iop-adma: let devm do its job, don't duplicate free dmaengine: kill enum dma_state_client dmaengine: remove 'bigref' infrastructure dmaengine: kill struct dma_client and supporting infrastructure dmaengine: replace dma_async_client_register with dmaengine_get atmel-mci: convert to dma_request_channel and down-level dma_slave dmatest: convert to dma_request_channel dmaengine: introduce dma_request_channel and private channels net_dma: convert to dma_find_channel dmaengine: provide a common 'issue_pending_all' implementation dmaengine: centralize channel allocation, introduce dma_find_channel dmaengine: up-level reference counting to the module level ...
| * ioat: fix self test for multi-channel caseDan Williams2009-01-06
| | | | | | | | | | | | | | | | | | | | | | In the multiple device case we need to re-arm the completion and protect against concurrent self-tests. The printk from the test callback is removed as it can arbitrarily delay completion of the test. Cc: <stable@kernel.org> Cc: Maciej Sosnowski <maciej.sosnowski@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * dmaengine: bump initcall level to arch_initcallDan Williams2009-01-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are dmaengine users that would like to register dma devices at subsys_initcall time to ensure channels are available by device_initcall time. Cc: Maciej Sosnowski <maciej.sosnowski@intel.com> Cc: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Cc: Nicolas Ferre <nicolas.ferre@atmel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * dmaengine: advertise all channels on a device to dma_filter_fnDan Williams2009-01-06
| | | | | | | | | | | | | | | | | | | | | | | | | | Allow dma_filter_fn routines to disambiguate multiple channels on a device rather than assuming that all channels on a device are equal. Cc: Maciej Sosnowski <maciej.sosnowski@intel.com> Reported-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * dmaengine: use idr for registering dma device numbersDan Williams2009-01-06
| | | | | | | | | | | | | | | | | | | | | | | | | | This brings some predictability to dma device numbers, i.e. an rmmod/insmod cycle may now result in /sys/class/dma/dma0chan0 being restored rather than /sys/class/dma/dma1chan0 appearing. Cc: Maciej Sosnowski <maciej.sosnowski@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * dmaengine: add a release for dma class devices and dependent infrastructureDan Williams2009-01-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Resolves: WARNING: at drivers/base/core.c:122 device_release+0x4d/0x52() Device 'dma0chan0' does not have a release() function, it is broken and must be fixed. The dma_chan_dev object is introduced to gear-match sysfs kobject and dmaengine channel lifetimes. When a channel is removed access to the sysfs entries return -ENODEV until the kobject can be released. The bulk of the change is updates to existing code to handle the extra layer of indirection between a dma_chan and its struct device. Reported-by: Alexander Beregalov <a.beregalov@gmail.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Cc: Haavard Skinnemoen <haavard.skinnemoen@atmel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * ioat: do not perform removal actions at shutdownDan Williams2009-01-06
| | | | | | | | | | | | | | | | | | | | | | | | Unregistering services should only happen at "remove" time. This prevents the device from being unregistered while dmaengine clients are still active. Also, the comment on ioat_remove is stale since removal is prevented while a channel may be in use. Reported-by: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * iop-adma: enable module removalDan Williams2009-01-06
| | | | | | | | | | | | Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * iop-adma: kill debug BUG_ONDan Williams2009-01-06
| | | | | | | | | | | | | | | | | | This BUG_ON caught problems in early development but now it is in the way as it invalidly triggers when trying to remove the module. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * iop-adma: let devm do its job, don't duplicate freeDan Williams2009-01-06
| | | | | | | | | | | | | | | | No need to free stuff that the devm infrastructure will take care of... Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * dmaengine: kill enum dma_state_clientDan Williams2009-01-06
| | | | | | | | | | | | | | | | | | DMA_NAK is now useless. We can just use a bool instead. Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * dmaengine: remove 'bigref' infrastructureDan Williams2009-01-06
| | | | | | | | | | | | | | | | | | | | Reference counting is done at the module level so clients need not worry that a channel will leave while they are actively using dmaengine. Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * dmaengine: kill struct dma_client and supporting infrastructureDan Williams2009-01-06
| | | | | | | | | | | | | | | | | | | | | | | | All users have been converted to either the general-purpose allocator, dma_find_channel, or dma_request_channel. Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * dmaengine: replace dma_async_client_register with dmaengine_getDan Williams2009-01-06
| | | | | | | | | | | | | | | | | | | | Now that clients no longer need to be notified of channel arrival dma_async_client_register can simply increment the dmaengine_ref_count. Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * atmel-mci: convert to dma_request_channel and down-level dma_slaveDan Williams2009-01-06
| | | | | | | | | | | | | | | | | | | | | | dma_request_channel provides an exclusive channel, so we no longer need to pass slave data through dmaengine. Cc: Haavard Skinnemoen <haavard.skinnemoen@atmel.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * dmatest: convert to dma_request_channelDan Williams2009-01-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace the client registration infrastructure with a custom loop to poll for channels. Once dma_request_channel returns NULL stop asking for channels. A userspace side effect of this change if that loading the dmatest module before loading a dma driver will result in no channels being found, previously dmatest would get a callback. To facilitate testing in the built-in case dmatest_init is marked as a late_initcall. Another side effect is that channels under test can not be used for any other purpose. Cc: Haavard Skinnemoen <haavard.skinnemoen@atmel.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * dmaengine: introduce dma_request_channel and private channelsDan Williams2009-01-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This interface is primarily for device-to-memory clients which need to search for dma channels with platform-specific characteristics. The prototype is: struct dma_chan *dma_request_channel(dma_cap_mask_t mask, dma_filter_fn filter_fn, void *filter_param); When the optional 'filter_fn' parameter is set to NULL dma_request_channel simply returns the first channel that satisfies the capability mask. Otherwise, when the mask parameter is insufficient for specifying the necessary channel, the filter_fn routine can be used to disposition the available channels in the system. The filter_fn routine is called once for each free channel in the system. Upon seeing a suitable channel filter_fn returns DMA_ACK which flags that channel to be the return value from dma_request_channel. A channel allocated via this interface is exclusive to the caller, until dma_release_channel() is called. To ensure that all channels are not consumed by the general-purpose allocator the DMA_PRIVATE capability is provided to exclude a dma_device from general-purpose (memory-to-memory) consideration. Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * net_dma: convert to dma_find_channelDan Williams2009-01-06
| | | | | | | | | | | | | | | | | | Use the general-purpose channel allocation provided by dmaengine. Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * dmaengine: provide a common 'issue_pending_all' implementationDan Williams2009-01-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | async_tx and net_dma each have open-coded versions of issue_pending_all, so provide a common routine in dmaengine. The implementation needs to walk the global device list, so implement rcu to allow dma_issue_pending_all to run lockless. Clients protect themselves from channel removal events by holding a dmaengine reference. Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * dmaengine: centralize channel allocation, introduce dma_find_channelDan Williams2009-01-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allowing multiple clients to each define their own channel allocation scheme quickly leads to a pathological situation. For memory-to-memory offload all clients can share a central allocator. This simply moves the existing async_tx allocator to dmaengine with minimal fixups: * async_tx.c:get_chan_ref_by_cap --> dmaengine.c:nth_chan * async_tx.c:async_tx_rebalance --> dmaengine.c:dma_channel_rebalance * split out common code from async_tx.c:__async_tx_find_channel --> dma_find_channel Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * dmaengine: up-level reference counting to the module levelDan Williams2009-01-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Simply, if a client wants any dmaengine channel then prevent all dmaengine modules from being removed. Once the clients are done re-enable module removal. Why?, beyond reducing complication: 1/ Tracking reference counts per-transaction in an efficient manner, as is currently done, requires a complicated scheme to avoid cache-line bouncing effects. 2/ Per-transaction ref-counting gives the false impression that a dma-driver can be gracefully removed ahead of its user (net, md, or dma-slave) 3/ None of the in-tree dma-drivers talk to hot pluggable hardware, but if such an engine were built one day we still would not need to notify clients of remove events. The driver can simply return NULL to a ->prep() request, something that is much easier for a client to handle. Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * dmaengine: remove dependency on async_txDan Williams2009-01-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | async_tx.ko is a consumer of dma channels. A circular dependency arises if modules in drivers/dma rely on common code in async_tx.ko. It prevents either module from being unloaded. Move dma_wait_for_async_tx and async_tx_run_dependencies to dmaeninge.o where they should have been from the beginning. Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * async_tx, dmaengine: document channel allocation and api reworkDan Williams2009-01-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | "Wouldn't it be better if the dmaengine layer made sure it didn't pass the same channel several times to a client? I mean, you seem concerned that the memcpy() API should be transparent and easy to use, but the whole registration interface is just ridiculously complicated..." - Haavard The dmaengine and async_tx registration/allocation interface is indeed needlessly complicated. This redesign has the following goals: 1/ Simplify reference counting: dma channels are not something one would expect to be hotplugged, it should be an exceptional event handled by drivers not something clients should be mandated to handle in a callback. The common case channel removal event is 'rmmod <dma driver>', which for simplicity should be disallowed if the channel is in use. 2/ Add an interface for requesting exclusive access to a channel suitable to device-to-memory users. 3/ Convert all memory-to-memory users over to a common allocator, the goal here is to not have competing channel allocation schemes. The only competition should be between device-to-memory exclusive allocations and the memory-to-memory usage case where channels are shared between multiple "clients". Cc: Haavard Skinnemoen <haavard.skinnemoen@atmel.com> Cc: Neil Brown <neilb@suse.de> Cc: Jeff Garzik <jeff@garzik.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* | Merge branch 'for_linus' of ↵Linus Torvalds2009-01-08
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (57 commits) jbd2: Fix oops in jbd2_journal_init_inode() on corrupted fs ext4: Remove "extents" mount option block: Add Kconfig help which notes that ext4 needs CONFIG_LBD ext4: Make printk's consistently prefixed with "EXT4-fs: " ext4: Add sanity checks for the superblock before mounting the filesystem ext4: Add mount option to set kjournald's I/O priority jbd2: Submit writes to the journal using WRITE_SYNC jbd2: Add pid and journal device name to the "kjournald2 starting" message ext4: Add markers for better debuggability ext4: Remove code to create the journal inode ext4: provide function to release metadata pages under memory pressure ext3: provide function to release metadata pages under memory pressure add releasepage hooks to block devices which can be used by file systems ext4: Fix s_dirty_blocks_counter if block allocation failed with nodelalloc ext4: Init the complete page while building buddy cache ext4: Don't allow new groups to be added during block allocation ext4: mark the blocks/inode bitmap beyond end of group as used ext4: Use new buffer_head flag to check uninit group bitmaps initialization ext4: Fix the race between read_inode_bitmap() and ext4_new_inode() ext4: code cleanup ...
| * | jbd2: Fix oops in jbd2_journal_init_inode() on corrupted fsJan Kara2009-01-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On 32-bit system with CONFIG_LBD getblk can fail because provided block number is too big. Add error checks so we fail gracefully if getblk() returns NULL (which can also happen on memory allocation failures). Thanks to David Maciejak from Fortinet's FortiGuard Global Security Research Team for reporting this bug. http://bugzilla.kernel.org/show_bug.cgi?id=12370 Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> cc: stable@kernel.org
| * | ext4: Remove "extents" mount optionTheodore Ts'o2009-01-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This mount option is largely superfluous, and in fact the way it was implemented was buggy; if a filesystem which did not have the extents feature flag was mounted -o extents, the filesystem would attempt to create and use extents-based file even though the extents feature flag was not eabled. The simplest thing to do is to nuke the mount option entirely. It's not all that useful to force the non-creation of new extent-based files if the filesystem can support it. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | block: Add Kconfig help which notes that ext4 needs CONFIG_LBDTheodore Ts'o2009-01-06
| | | | | | | | | | | | | | | Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Jens Axboe <jens.axboe@oracle.com>
| * | ext4: Make printk's consistently prefixed with "EXT4-fs: "Theodore Ts'o2009-01-06
| | | | | | | | | | | | | | | | | | | | | Previously, some were "ext4: ", and some were "EXT4: "; change them to be consistent with most ext4 printk's, which is to use "EXT4-fs: ". Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: Add sanity checks for the superblock before mounting the filesystemTheodore Ts'o2009-01-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This avoids insane superblock configurations that could lead to kernel oops due to null pointer derefences. http://bugzilla.kernel.org/show_bug.cgi?id=12371 Thanks to David Maciejak at Fortinet's FortiGuard Global Security Research Team who discovered this bug independently (but at approximately the same time) as Thiemo Nagel, who submitted the patch. Signed-off-by: Thiemo Nagel <thiemo.nagel@ph.tum.de> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
| * | ext4: Add mount option to set kjournald's I/O priorityTheodore Ts'o2009-01-05
| | | | | | | | | | | | | | | Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Jens Axboe <jens.axboe@oracle.com>
| * | jbd2: Submit writes to the journal using WRITE_SYNCTheodore Ts'o2009-01-04
| | | | | | | | | | | | | | | | | | | | | | | | Since we will be waiting the write of the commit record to the journal to complete in journal_submit_commit_record(), submit it using WRITE_SYNC. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | jbd2: Add pid and journal device name to the "kjournald2 starting" messageTheodore Ts'o2009-01-03
| | | | | | | | | | | | Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: Add markers for better debuggabilityTheodore Ts'o2009-01-03
| | | | | | | | | | | | Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: Remove code to create the journal inodeTheodore Ts'o2009-01-06
| | | | | | | | | | | | | | | | | | | | | | | | This code has been obsolete in quite some time, since the supported method for adding a journal inode is to use tune2fs (or to creating new filesystem with a journal via mke2fs or mkfs.ext4). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: provide function to release metadata pages under memory pressureToshiyuki Okajima2009-01-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pages in the page cache belonging to ext4 data files are released via the ext4_releasepage() function specified in the ext4 inode's address_space_ops. However, metadata blocks (such as indirect blocks, directory blocks, etc) are managed via the block device address_space_ops, and they can not be released by try_to_free_buffers() if they have a journal head attached to them. To address this, we supply a release_metadata function which calls jbd2_journal_try_to_free_buffers() function to free the metadata, and which is called by the block device's blkdev_releasepage() function. Signed-off-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: linux-fsdevel@vger.kernel.org
| * | ext3: provide function to release metadata pages under memory pressureToshiyuki Okajima2009-01-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pages in the page cache belonging to ext3 data files are released via the ext3_releasepage() function specified in the ext3 inode's address_space_ops. However, metadata blocks (such as indirect blocks, directory blocks, etc) are managed via the block device address_space_ops, and they can not be released by try_to_free_buffers() if they have a journal head attached to them. To address this, we supply a try_to_free_pages() function which calls journal_try_to_free_buffers() function to free the metadata, and which is called by the block device's blkdev_releasepage() function. Signed-off-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: linux-fsdevel@vger.kernel.org
| * | add releasepage hooks to block devices which can be used by file systemsTheodore Ts'o2009-01-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | Implement blkdev_releasepage() to release the buffer_heads and pages after we release private data belonging to a mounted filesystem. Cc: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Cc: linux-fsdevel@vger.kernel.org Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: Fix s_dirty_blocks_counter if block allocation failed with nodelallocAneesh Kumar K.V2009-01-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With nodelalloc option we need to update the dirty block counter on block allocation failure. This is needed because we increment the dirty block counter early in the block allocation phase. Without the patch s_dirty_blocks_counter goes wrong so that filesystem's free blocks decreases incorrectly. Tested-by: Akira Fujita <a-fujita@rs.jp.nec.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
| * | ext4: Init the complete page while building buddy cacheAneesh Kumar K.V2009-01-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to init the complete page during buddy cache init by setting the contents to '1'. Otherwise we can see the following errors after doing an online resize of the filesystem: EXT4-fs error (device sdb1): ext4_mb_mark_diskspace_used: Allocating block 1040385 in system zone of 127 group Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
| * | ext4: Don't allow new groups to be added during block allocationAneesh Kumar K.V2009-01-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After we mark the blocks in the buddy cache as allocated, we need to ensure that we don't reinit the buddy cache until the block bitmap is updated. This commit achieves this by holding the group_info alloc_semaphore till ext4_mb_release_context Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
| * | ext4: mark the blocks/inode bitmap beyond end of group as usedAneesh Kumar K.V2009-01-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | We need to mark the block/inode bitmap beyond the end of the group with '1'. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
| * | ext4: Use new buffer_head flag to check uninit group bitmaps initializationAneesh Kumar K.V2009-01-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For uninit block group, the on-disk bitmap is not initialized. That implies we cannot depend on the uptodate flag on the bitmap buffer_head to find bitmap validity. Use a new buffer_head flag which would be set after we properly initialize the bitmap. This also prevents (re-)initializing the uninit group bitmap every time we call ext4_read_block_bitmap(). Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
| * | ext4: Fix the race between read_inode_bitmap() and ext4_new_inode()Aneesh Kumar K.V2009-01-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to make sure we update the inode bitmap and clear EXT4_BG_INODE_UNINIT flag with sb_bgl_lock held, since ext4_read_inode_bitmap() looks at EXT4_BG_INODE_UNINIT to decide whether to initialize the inode bitmap each time it is called. (introduced by commit c806e68f.) ext4_read_inode_bitmap does: spin_lock(sb_bgl_lock(EXT4_SB(sb), block_group)); if (desc->bg_flags & cpu_to_le16(EXT4_BG_INODE_UNINIT)) { ext4_init_inode_bitmap(sb, bh, block_group, desc); and ext4_new_inode does if (!ext4_set_bit_atomic(sb_bgl_lock(sbi, group), ino, inode_bitmap_bh->b_data)) ...... ... spin_lock(sb_bgl_lock(sbi, group)); gdp->bg_flags &= cpu_to_le16(~EXT4_BG_INODE_UNINIT); i.e., on allocation we update the bitmap then we take the sb_bgl_lock and clear the EXT4_BG_INODE_UNINIT flag. What can happen is a parallel ext4_read_inode_bitmap can zero out the bitmap in between the above ext4_set_bit_atomic and spin_lock(sb_bg_lock..) The race results in below user visible errors EXT4-fs error (device sdb1): ext4_free_inode: bit already cleared for inode 168449 EXT4-fs warning (device sdb1): ext4_unlink: Deleting nonexistent file ... EXT4-fs warning (device sdb1): ext4_rmdir: empty directory has too many links ... # ls -al /mnt/tmp/f/p369/d3/d6/d39/db2/dee/d10f/d3f/l71 ls: /mnt/tmp/f/p369/d3/d6/d39/db2/dee/d10f/d3f/l71: Stale NFS file handle Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
| * | ext4: code cleanupAneesh Kumar K.V2009-01-03
| | | | | | | | | | | | | | | | | | | | | | | | Rename some variables. We also unlock locks in the reverse order we acquired as a part of cleanup. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: Use high 16 bits of the block group descriptor's free counts fieldsAneesh Kumar K.V2009-01-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | Rename the lower bits with suffix _lo and add helper to access the values. Also rename bg_itable_unused_hi to bg_pad as in e2fsprogs. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: Fix race between read_block_bitmap() and mark_diskspace_used()Aneesh Kumar K.V2009-01-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to make sure we update the block bitmap and clear EXT4_BG_BLOCK_UNINIT flag with sb_bgl_lock held, since ext4_read_block_bitmap() looks at EXT4_BG_BLOCK_UNINIT to decide whether to initialize the block bitmap each time it is called (introduced by commit c806e68f), and this can race with block allocations in ext4_mb_mark_diskspace_used(). ext4_read_block_bitmap does: spin_lock(sb_bgl_lock(EXT4_SB(sb), block_group)); if (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) { ext4_init_block_bitmap(sb, bh, block_group, desc); Now on the block allocation side we do mb_set_bits(sb_bgl_lock(sbi, ac->ac_b_ex.fe_group), bitmap_bh->b_data, ac->ac_b_ex.fe_start, ac->ac_b_ex.fe_len); .... spin_lock(sb_bgl_lock(sbi, ac->ac_b_ex.fe_group)); if (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) { gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT); ie on allocation we update the bitmap then we take the sb_bgl_lock and clear the EXT4_BG_BLOCK_UNINIT flag. What can happen is a parallel ext4_read_block_bitmap can zero out the bitmap in between the above mb_set_bits and spin_lock(sb_bg_lock..) The race results in below user visible errors EXT4-fs error (device sdb1): ext4_mb_release_inode_pa: free 100, pa_free 105 EXT4-fs error (device sdb1): mb_free_blocks: double-free of inode 0's block .. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
| * | ext4: fix BUG when calling ext4_error with locked block groupAneesh Kumar K.V2009-01-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The mballoc code likes to call ext4_error while it is holding locked block groups. This can causes a scheduling in atomic context BUG. We can't just unlock the block group and relock it after/if ext4_error returns since that might result in race conditions in the case where the filesystem is set to continue after finding errors. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: Fix lockdep recursive locking warningAneesh Kumar K.V2008-11-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In ext4_mb_init_group(), if the filesystem block size is less than PAGE_SIZE/2, the code tries to grab alloc_sem for multiple block groups in a loop. We need to allow for this by using down_write_nested() and passing in the loop index as a lock subclass number. This works because no other code path needs to take multiple alloc_sem's. Note that lockdep will fail for filesystem blocksize smaller than to PAGE_SIZE/16k. (e.g., a 1k filesystem blocksize with a 32k page size, or a 2k filesystem blocksize with a 64k blocksize, etc.) Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: don't use blocks freed but not yet committed in buddy cache initAneesh Kumar K.V2009-01-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we generate buddy cache (especially during resize) we need to make sure we don't use the blocks freed but not yet comitted. This makes sure we have the right value of free blocks count in the group info and also in the bitmap. This also ensures the ordered mode consistency Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
| * | jbd2: Call journal commit callback without holding j_list_lockAneesh Kumar K.V2008-11-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | Avoid freeing the transaction in __jbd2_journal_drop_transaction() so the journal commit callback can run without holding j_list_lock, to avoid lock contention on this spinlock. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>