aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAge
...
| * blkio: Export disk time and sectors used by a group to user spaceVivek Goyal2009-12-03
| | | | | | | | | | | | | | | | | | | | | | o Export disk time and sector used by a group to user space through cgroup interface. o Also export a "dequeue" interface to cgroup which keeps track of how many a times a group was deleted from service tree. Helps in debugging. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * blkio: Some debugging aids for CFQVivek Goyal2009-12-03
| | | | | | | | | | | | | | o Some debugging aids for CFQ. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * blkio: Take care of cgroup deletion and cfq group reference countingVivek Goyal2009-12-03
| | | | | | | | | | | | | | | | | | | | o One can choose to change elevator or delete a cgroup. Implement group reference counting so that both elevator exit and cgroup deletion can take place gracefully. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Nauman Rafique <nauman@google.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * blkio: Dynamic cfq group creation based on cgroup tasks belongs toVivek Goyal2009-12-03
| | | | | | | | | | | | | | | | | | | | | | | | o Determine the cgroup IO submitting task belongs to and create the cfq group if it does not exist already. o Also link cfqq and associated cfq group. o Currently all async IO is mapped to root group. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * blkio: Group time used accounting and workload context save restoreVivek Goyal2009-12-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | o This patch introduces the functionality to do the accounting of group time when a queue expires. This time used decides which is the group to go next. o Also introduce the functionlity to save and restore the workload type context with-in group. It might happen that once we expire the cfq queue and group, a different group will schedule in and we will lose the context of the workload type. Hence save and restore it upon queue expiry. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * blkio: Implement per cfq group latency target and busy queue avgVivek Goyal2009-12-03
| | | | | | | | | | | | | | | | | | | | | | o So far we had 300ms soft target latency system wide. Now with the introduction of cfq groups, divide that latency by number of groups so that one can come up with group target latency which will be helpful in determining the workload slice with-in group and also the dynamic slice length of the cfq queue. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * blkio: Introduce per cfq group weights and vdisktime calculationsVivek Goyal2009-12-03
| | | | | | | | | | | | | | | | | | o Bring in the per cfq group weight and how vdisktime is calculated for the group. Also bring in the functionality of updating the min_vdisktime of the group service tree. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * blkio: Introduce blkio controller cgroup interfaceVivek Goyal2009-12-03
| | | | | | | | | | | | | | | | | | o This is basic implementation of blkio controller cgroup interface. This is the common interface visible to user space and should be used by different IO control policies as we implement those. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * blkio: Introduce the root service tree for cfq groupsVivek Goyal2009-12-03
| | | | | | | | | | | | | | | | | | o So far we just had one cfq_group in cfq_data. To create space for more than one cfq_group, we need to have a service tree of groups where all the groups can be queued if they have active cfq queues backlogged in these. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * blkio: Keep queue on service tree until we expire itVivek Goyal2009-12-03
| | | | | | | | | | | | | | | | | | | | | | | | | | o Currently cfqq deletes a queue from service tree if it is empty (even if we might idle on the queue). This patch keeps the queue on service tree hence associated group remains on the service tree until we decide that we are not going to idle on the queue and expire it. o This just helps in time accounting for queue/group and in implementation of rest of the patches. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * blkio: Implement macro to traverse each service tree in groupVivek Goyal2009-12-03
| | | | | | | | | | | | | | | | | | | | | | o Implement a macro to traverse each service tree in the group. This avoids usage of double for loop and special condition for idle tree 4 times. o Macro is little twisted because of special handling of idle class service tree. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * blkio: Introduce the notion of cfq groupsVivek Goyal2009-12-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | o This patch introduce the notion of cfq groups. Soon we will can have multiple groups of different weights in the system. o Various service trees (prioclass and workload type trees), will become per cfq group. So hierarchy looks as follows. cfq_groups | workload type | cfq queue o When an scheduling decision has to be taken, first we select the cfq group then workload with-in the group and then cfq queue with-in the workload type. o This patch just makes various workload service tree per cfq group and introduce the function to be able to choose a group for scheduling. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * blkio: Set must_dispatch only if we decided to not dispatch the requestVivek Goyal2009-12-03
| | | | | | | | | | | | | | | | o must_dispatch flag should be set only if we decided not to run the queue and dispatch the request. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * drbd_req.c: use part_[inc|dec]_in_flight()Philipp Reisner2009-12-03
| | | | | | | | | | Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * writeback: remove unused nonblocking and congestion checksWu Fengguang2009-12-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - no one is calling wb_writeback and write_cache_pages with wbc.nonblocking=1 any more - lumpy pageout will want to do nonblocking writeback without the congestion wait So remove the congestion checks as suggested by Chris. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Evgeniy Polyakov <zbr@ioremap.net> Cc: Alex Elder <aelder@sgi.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * writeback: introduce wbc.for_backgroundWu Fengguang2009-12-03
| | | | | | | | | | | | | | | | | | | | It will lower the flush priority for NFS, and maybe more in future. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * writeback: remove the always false bdi_cap_writeback_dirty() testWu Fengguang2009-12-03
| | | | | | | | | | | | | | | | | | | | | | This is dead code because no bdi flush thread will be started for !bdi_cap_writeback_dirty bdi. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * flusher: Fix PF_FROZEN raceOGAWA Hirofumi2009-12-03
| | | | | | | | | | | | | | | | | | | | | | To touch task->flags directly is racy. thaw_process() still has race (changing non_current->flags, but this is another issue) though, I think it's much better off. So, use thaw_process() instead. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * Merge branch 'master' into for-2.6.33Jens Axboe2009-12-03
| |\
| * | cfq-iosched: no dispatch limit for single queueShaohua Li2009-12-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit 2f5cb7381b737e24c8046fd4aeab571fb71315f5, each queue can send up to 4 * 4 requests if only one queue exists. I wonder why we have such limit. Device supports tag can send more requests. For example, AHCI can send 31 requests. Test (direct aio randread) shows the limits reduce about 4% disk thoughput. On the other hand, since we send one request one time, if other queue pop when current is sending more than cfq_quantum requests, current queue will stop send requests soon after one request, so sounds there is no big latency. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | block: Allow devices to indicate whether discarded blocks are zeroedMartin K. Petersen2009-12-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The discard ioctl is used by mkfs utilities to clear a block device prior to putting metadata down. However, not all devices return zeroed blocks after a discard. Some drives return stale data, potentially containing old superblocks. It is therefore important to know whether discarded blocks are properly zeroed. Both ATA and SCSI drives have configuration bits that indicate whether zeroes are returned after a discard operation. Implement a block level interface that allows this information to be bubbled up the stack and queried via a new block device ioctl. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | Revert "cfq: Make use of service count to estimate the rb_key offset"Jens Axboe2009-11-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 3586e917f2c7df769d173c4ec99554cb40a911e5. Corrado Zoccolo <czoccolo@gmail.com> correctly points out, that we need consistency of rb_key offset across groups. This means we cannot properly use the per-service_tree service count. Revert this change. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | cfq-iosched: fix corner cases in idling logicCorrado Zoccolo2009-11-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Idling logic was disabled in some corner cases, leading to unfair share for noidle queues. * the idle timer was not armed if there were other requests in the driver. unfortunately, those requests could come from other workloads, or queues for which we don't enable idling. So we will check only pending requests from the active queue * rq_noidle check on no-idle queue could disable the end of tree idle if the last completed request was rq_noidle. Now, we will disable that idle only if all the queues served in the no-idle tree had rq_noidle requests. Reported-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Corrado Zoccolo <czoccolo@gmail.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | cfq-iosched: idling on deep seeky sync queuesCorrado Zoccolo2009-11-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Seeky sync queues with large depth can gain unfairly big share of disk time, at the expense of other seeky queues. This patch ensures that idling will be enabled for queues with I/O depth at least 4, and small think time. The decision to enable idling is sticky, until an idle window times out without seeing a new request. The reasoning behind the decision is that, if an application is using large I/O depth, it is already optimized to make full utilization of the hardware, and therefore we reserve a slice of exclusive use for it. Reported-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Corrado Zoccolo <czoccolo@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | cfq-iosched: fix no-idle preemption logicCorrado Zoccolo2009-11-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An incoming no-idle queue should preempt the active no-idle queue only if the active queue is idling due to service tree empty. Previous code was buggy in two ways: * it relied on service_tree field to be set on the active queue, while it is not set when the code is idling for a new request * it didn't check for the service tree empty condition, so could lead to LIFO behaviour if multiple queues with depth > 1 were preempting each other on an non-NCQ device. Reported-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Corrado Zoccolo <czoccolo@gmail.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | cfq-iosched: fix ncq detection codeCorrado Zoccolo2009-11-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CFQ's detection of queueing devices initially assumes a queuing device and detects if the queue depth reaches a certain threshold. However, it will reconsider this choice periodically. Unfortunately, if device is considered not queuing, CFQ will force a unit queue depth for some workloads, thus defeating the detection logic. This leads to poor performance on queuing hardware, since the idle window remains enabled. Given this premise, switching to hw_tag = 0 after we have proved at least once that the device is NCQ capable is not a good choice. The new detection code starts in an indeterminate state, in which CFQ behaves as if hw_tag = 1, and then, if for a long observation period we never saw large depth, we switch to hw_tag = 0, otherwise we stick to hw_tag = 1, without reconsidering it again. Signed-off-by: Corrado Zoccolo <czoccolo@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | Merge branch 'for-jens' of git://git.drbd.org/linux-2.6-drbd into for-2.6.33Jens Axboe2009-11-26
| |\ \
| | * | drbd: moved CN_IDX_DRBD and CN_VAL_DRBD to the right filePhilipp Reisner2009-11-25
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| | * | DRBD: Now the code is 8.3.6 + 3 fixes (without compat crap)Philipp Reisner2009-11-24
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| | * | Fixed a regression in resync decission code drbd_uuid_compare() [Bugz 260]Philipp Reisner2009-11-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since 8.3.3 we fail to do the resync when a partial resynch is not possible, but a full synch is necessary. This regression was introduced with 7101539930c0a89146959e7a39c09ad9c3516434 Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| | * | add missing state change on corrupt packet header in drbd_recv_headerLars Ellenberg2009-11-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Otherwise the 'state fixup' in the receiver will change to Unconnected, but the receiver will terminate itself, and any attempt at 'down'ing that drbd later will block forever. see also Bugz. #259 Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| | * | fix in-kernel configuration serializationLars Ellenberg2009-11-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | this is uncritical, as we still also serialize in userland, but to correctly serialize on the CONFIG_PENDING bit, it must be wait_event(state_wait, \!test_and_set_bit) Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * | | Fix regression in direct writes performance due to WRITE_ODIRECT flag removalVivek Goyal2009-11-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There seems to be a regression in direct write path due to following commit in for-2.6.33 branch of block tree. commit 1af60fbd759d31f565552fea315c2033947cfbe6 Author: Jeff Moyer <jmoyer@redhat.com> Date: Fri Oct 2 18:56:53 2009 -0400 block: get rid of the WRITE_ODIRECT flag Marking direct writes as WRITE_SYNC_PLUG instead of WRITE_ODIRECT, sets the NOIDLE flag in bio and hence in request. This tells CFQ to not expect more request from the queue and not idle on it (despite the fact that queue's think time is less and it is not seeky). So direct writers lose big time when competing with sequential readers. Using fio, I have run one direct writer and two sequential readers and following are the results with 2.6.32-rc7 kernel and with for-2.6.33 branch. Test ==== 1 direct writer and 2 sequential reader running simultaneously. [global] directory=/mnt/sdc/fio/ runtime=10 [seqwrite] rw=write size=4G direct=1 [seqread] rw=read size=2G numjobs=2 2.6.32-rc7 ========== direct writes: aggrb=2,968KB/s readers : aggrb=101MB/s for-2.6.33 branch ================= direct write: aggrb=19KB/s readers aggrb=137MB/s This patch brings back the WRITE_ODIRECT flag, with the difference that we don't set the BIO_RW_UNPLUG flag so that device is not unplugged after submission of request and an explicit unplug from submitter is required. That way we fix the jeff's issue of not enough merging taking place in aio path as well as make sure direct writes get their fair share. After the fix ============= for-2.6.33 + fix ---------------- direct writes: aggrb=2,728KB/s reads: aggrb=103MB/s Thanks Vivek Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | | cfq-iosched: cleanup unreachable codeCorrado Zoccolo2009-11-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cfq_should_idle returns false for no-idle queues that are not the last, so the control flow will never reach the removed code in a state that satisfies the if condition. The unreachable code was added to emulate previous cfq behaviour for non-NCQ rotational devices. My tests show that even without it, the performances and fairness are comparable with previous cfq, thanks to the fact that all seeky queues are grouped together, and that we idle at the end of the tree. Signed-off-by: Corrado Zoccolo <czoccolo@gmail.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | | block: add helpers to run flush_dcache_page() against a bio and a request's ↵Ilya Loginov2009-11-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pages Mtdblock driver doesn't call flush_dcache_page for pages in request. So, this causes problems on architectures where the icache doesn't fill from the dcache or with dcache aliases. The patch fixes this. The ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE symbol was introduced to avoid pointless empty cache-thrashing loops on architectures for which flush_dcache_page() is a no-op. Every architecture was provided with this flush pages on architectires where ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE is equal 1 or do nothing otherwise. See "fix mtd_blkdevs problem with caches on some architectures" discussion on LKML for more information. Signed-off-by: Ilya Loginov <isloginov@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Peter Horton <phorton@bitbox.co.uk> Cc: "Ed L. Cashin" <ecashin@coraid.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | | cfq: Make use of service count to estimate the rb_key offsetGui Jianfeng2009-11-26
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For the moment, different workload cfq queues are put into different service trees. But CFQ still uses "busy_queues" to estimate rb_key offset when inserting a cfq queue into a service tree. I think this isn't appropriate, and it should make use of service tree count to do this estimation. This patch is for for-2.6.33 branch. Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | cciss: change Cmd_sg_list.sg_chain_dma type to dma_addr_tAlex Chiang2009-11-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A recent commit broke the ia64 build: Author: Don Brace <brace@beardog.cce.hp.com> Date: Thu Nov 12 12:50:01 2009 -0600 cciss: Add enhanced scatter-gather support. because of this hunk: --- a/drivers/block/cciss.h +++ b/drivers/block/cciss.h +struct Cmd_sg_list { + SGDescriptor_struct *sgchain; + dma64_addr_t sg_chain_dma; + int chain_block_size; +}; The issue is that dma64_addr_t isn't #define'd on ia64. The way that we're using Cmd_sg_list.sg_chain_dma is to hold an address returned from pci_map_single(). + temp64.val = pci_map_single(h->pdev, + h->cmd_sg_list[c->cmdindex]->sgchain, + len, dir); + + h->cmd_sg_list[c->cmdindex]->sg_chain_dma = temp64.val; pci_map_single() returns a dma_addr_t too. This code will still work even on a 32-bit x86 build, where dma_addr_t is defined to be a u32 because it will simply be promoted to the __u64 that temp64.val is defined as. Thus, declaring Cmd_sg_list.sg_chain_dma as dma_addr_t is safe. Cc: Don Brace <brace@beardog.cce.hp.com> Cc: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | cciss: fix scatter gather cleanup problemsStephen M. Cameron2009-11-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On driver unload, only free up the extra scatter gather data if they were allocated in the first place (the controller supports it) and don't forget to free up the sg_cmd_list array of pointers. Signed-off-by: Don Brace <brace@beardog.cce.hp.com> Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | partitions: read whole sector with EFI GPT headerKarel Zak2009-11-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The size of EFI GPT header is not static, but whole sector is allocated for the header. The HeaderSize field must be greater than 92 (= sizeof(struct gpt_header) and must be less than or equal to the logical block size. It means we have to read whole sector with the header, because the header crc32 checksum is calculated according to HeaderSize. For more details see UEFI standard (version 2.3, May 2009): - 5.3.1 GUID Format overview, page 93 - Table 13. GUID Partition Table Header, page 96 Signed-off-by: Karel Zak <kzak@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | partitions: use sector size for EFI GPTKarel Zak2009-11-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, kernel uses strictly 512-byte sectors for EFI GPT parsing. That's wrong. UEFI standard (version 2.3, May 2009, 5.3.1 GUID Format overview, page 95) defines that LBA is always based on the logical block size. It means bdev_logical_block_size() (aka BLKSSZGET) for Linux. This patch removes static sector size from EFI GPT parser. The problem is reproducible with the latest GNU Parted: # modprobe scsi_debug dev_size_mb=50 sector_size=4096 # ./parted /dev/sdb print Model: Linux scsi_debug (scsi) Disk /dev/sdb: 52.4MB Sector size (logical/physical): 4096B/4096B Partition Table: gpt Number Start End Size File system Name Flags 1 24.6kB 3002kB 2978kB primary 2 3002kB 6001kB 2998kB primary 3 6001kB 9003kB 3002kB primary # blockdev --rereadpt /dev/sdb # dmesg | tail -1 sdb: unknown partition table <---- !!! with this patch: # blockdev --rereadpt /dev/sdb # dmesg | tail -1 sdb: sdb1 sdb2 sdb3 Signed-off-by: Karel Zak <kzak@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | cciss: Fix weird usage of ENXIO in cciss_scsi.cStephen M. Cameron2009-11-13
| | | | | | | | | | | | | | | | | | | | | cciss: Fix weird usage of ENXIO in cciss_scsi.c Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | cciss: Add enhanced scatter-gather support.Don Brace2009-11-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cciss: Add enhanced scatter-gather support. For controllers which supported, more than 512 scatter-gather elements per command may be used, and the max transfer size can be increased to 8192 blocks. Signed-off-by: Don Brace <brace@beardog.cce.hp.com> Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | cciss: Do not automatically rescan on UNIT ATTENTION/LUN DATA CHANGEDStephen M. Cameron2009-11-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cciss: Do not automatically rescan on UNIT ATTENTION/LUN DATA CHANGED There are problems with doing this. If, say, several logical drives are deleted at once, several such UNIT ATTENTIONS will be encountered, often during the rescan triggered by the first such UNIT ATTENTION. The block layer may be in the midst of trying to add logical drives which were just deleted (resulting in the subsequent UNIT ATTENTION(s).) Making the rescan code robust enough to tolerate this kind of thing is too complicated for the moment. So, for now, we just don't do it. Note: This UNIT ATTENTION/LUN DATA CHANGED situation only occurs on the MSA2012. Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | cciss: Remove unnecessary check in scan_threadStephen M. Cameron2009-11-13
| | | | | | | | | | | | | | | | | | | | | cciss: Remove unnecessary check in scan_thread Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | cciss: fix typo that causes scsi status to be lost.Stephen M. Cameron2009-11-13
| | | | | | | | | | | | | | | | | | | | | cciss: fix typo that causes scsi status to be lost. Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | cciss: remove sendcmd() as it is no longer used.Stephen M. Cameron2009-11-13
| | | | | | | | | | | | | | | | | | | | | cciss: remove sendcmd() as it is no longer used. Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | cciss: clean up code in cciss_shutdownStephen M. Cameron2009-11-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | cciss: clean up code in cciss_shutdown. Send the flush cache command down with interrupts still enabled, and do not do DMA from the stack. Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | cciss: Remove the "withirq" parameter from various functions where possibleStephen M. Cameron2009-11-13
| | | | | | | | | | | | | | | | | | | | | cciss: Remove the "withirq" parameter from various functions where possible Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | cciss: Retry driver initiated cmds with unit attention conditionStephen M. Cameron2009-11-13
| | | | | | | | | | | | | | | | | | | | | cciss: Retry driver initiated cmds with unit attention condition Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | cciss: Fix problem with remove_from_scan_list on driver unloadStephen M. Cameron2009-11-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | cciss: Fix problem with remove_from_scan_list that on driver unload it doesn't remove the controller from the scan list correctly if the controller is currently being scanned for new devices. Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>