aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAge
* md/raid1: fix confusing 'redirect sector' message.NeilBrown2010-05-18
| | | | | | | | | | | This message seems to suggest the named device is the one on which a read failed, however it is actually the device that the read will be redirected to. So make the message a little clearer. Reported-by: Tim Burgess <ozburgess@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
* md: don't unregister the thread in mddev_suspendNeilBrown2010-05-18
| | | | | | | | | | This is - unnecessary because mddev_suspend is always followed by a call to ->stop, and each ->stop unregisters the thread, and - a problem as it makes it awkwards to suspend and then resume a device as we will want later. Signed-off-by: NeilBrown <neilb@suse.de>
* md: factor out init code for an mddevNeilBrown2010-05-18
| | | | | | | This is a simple factorisation that makes mddev_find easier to read. Signed-off-by: NeilBrown <neilb@suse.de>
* md: pass mddev to make_request functions rather than request_queueNeilBrown2010-05-18
| | | | | | | | | | We used to pass the personality make_request function direct to the block layer so the first argument had to be a queue. But now we have the intermediary md_make_request so it makes at lot more sense to pass a struct mddev_s. It makes it possible to have an mddev without its own queue too. Signed-off-by: NeilBrown <neilb@suse.de>
* md: call md_stop_writes from md_stopNeilBrown2010-05-18
| | | | | | | | | | This moves the call to the other side of set_readonly, but that should not be an issue. This encapsulates in 'md_stop' all of the functionality for internally stopping the array, leaving all the interactions with externalities (sysfs, request_queue, gendisk) in do_md_stop. Signed-off-by: NeilBrown <neilb@suse.de>
* md: split md_set_readonly out of do_md_stopNeilBrown2010-05-18
| | | | | | | | Using do_md_stop to set an array to read-only is a little confusing. Now most of the common code has been factored out, split md_set_readonly off in to a separate function. Signed-off-by: NeilBrown <neilb@suse.de>
* md: factor md_stop_writes out of do_md_stop.NeilBrown2010-05-18
| | | | | | | | | | | | | | | | | | Further refactoring of do_md_stop. This one requires some explanation as it takes code from different places in do_md_stop, so some re-ordering happens. We only get into this part of do_md_stop if there are no active opens of the device, so no writes can be happening and the device must have been flushed. In md_stop_writes we want to stop any internal sources of writes - i.e. resync - and flush out the metadata. The only code that was previously before some of this code is code to clean up the queue, the mddev, the gendisk, or sysfs, all of which is probably better after code that makes active changes (i.e. triggers writes). Signed-off-by: NeilBrown <neilb@suse.de>
* md: start to refactor do_md_stopNeilBrown2010-05-18
| | | | | | | | | do_md_stop is large and clunky, so hard to understand. This is a first step of refactoring, pulling two simple sub-functions out. Signed-off-by: NeilBrown <neilb@suse.de>
* md: factor do_md_run to separate accesses to ->gendiskNeilBrown2010-05-18
| | | | | | | | | | As part of relaxing the binding between an mddev and gendisk, we separate do_md_run into two functions. md_run does all the work internal to md do_md_run calls md_run and makes and changes to gendisk that are required. Signed-off-by: NeilBrown <neilb@suse.de>
* md: remove ->changed and related code.NeilBrown2010-05-18
| | | | | | | | | | | We set ->changed to 1 and call check_disk_change at the end of md_open so that bd_invalidated would be set and thus partition rescan would happen appropriately. Now that we call revalidate_disk directly, which sets bd_invalidates, that indirection is no longer needed and can be removed. Signed-off-by: NeilBrown <neilb@suse.de>
* md: don't reference gendisk in getgeoNeilBrown2010-05-18
| | | | | | | | Using ->array_sectors rather than get_capacity() is more direct and is a step towards relaxing the tight connection between mddev and gendisk. Signed-off-by: NeilBrown <neilb@suse.de>
* md: move io accounting out of personalities into md_make_requestNeilBrown2010-05-18
| | | | | | | | | While I generally prefer letting personalities do as much as possible, given that we have a central md_make_request anyway we may as well use it to simplify code. Also this centralises knowledge of ->gendisk which will help later. Signed-off-by: NeilBrown <neilb@suse.de>
* md/raid5: small tidyup in raid5_align_endioNeilBrown2010-05-18
| | | | | | | Diving through ->queue to find mddev is unnecessarily complex - there is an easier path to finding mddev, so use that. Signed-off-by: NeilBrown <neilb@suse.de>
* md: add support for raid5 to raid4 conversionNeilBrown2010-05-18
| | | | | | | | This is unlikely to be wanted, but we may as well provide it for completeness. Signed-off-by: NeilBrown <neilb@suse.de>
* md: notify level changes through sysfs.Maciej Trela2010-05-18
| | | | | | | | Level changes can be very significant, so make sure to notify them via sysfs. Signed-off-by: Maciej Trela <maciej.trela@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* md: Relax checks on ->max_disks when external metadata handling is used.NeilBrown2010-05-18
| | | | | | | | | When metadata is being managed by user-space, md doesn't know what the maximum number of devices allowed in an array is so ->max_disks is 0. In this case we should allow any (+ve) number of disks. Signed-off-by: NeilBrown <neilb@suse.de>
* md: Correctly handle device removal via sysfsMaciej Trela2010-05-18
| | | | | | | | | | Writing "none" to "../md/dev-xx/slot" removes that device from being an active part of the array, but it didn't set ->raid_disk to -1 to record this fact. Signed-off-by: Maciej Trela <Maciej.Trela@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* md: Add support for Raid0->Raid10 takeoverTrela, Maciej2010-05-18
| | | | | Signed-off-by: Maciej Trela <maciej.trela@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* md: Add support for Raid5->Raid0 and Raid10->Raid0 takeoverTrela, Maciej2010-05-18
| | | | | Signed-off-by: Maciej Trela <maciej.trela@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* md:Add support for Raid0->Raid5 takeoverTrela Maciej2010-05-18
| | | | | Signed-off-by: Maciej Trela <maciej.trela@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* md: don't use mddev->raid_disks in raid0 or raid10 while array is active.NeilBrown2010-05-18
| | | | | | | | | | | | | | In a subsequent patch we will make it possible to change mddev->raid_disks while a RAID0 or RAID10 array is active. This is part of the process of reshaping such an array. This means that we cannot use this value while processes requests (it is OK to use it during initialisation as we are locked against changes then). Both RAID0 and RAID10 have the same value stored in the private data structure, so use that value instead. Signed-off-by: NeilBrown <neilb@suse.de>
* md: discard StateChanged device flag.NeilBrown2010-05-18
| | | | | | | | This was needed when sysfs files could only be 'notified' from process context. Now that we have sys_notify_direct, we can call it directly from an interrupt. Signed-off-by: NeilBrown <neilb@suse.de>
* drivers/md: Remove unnecessary casts of void *H Hartley Sweeten2010-05-18
| | | | | | | void pointers do not need to be cast to other pointer types. Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Signed-off-by: NeilBrown <neilb@suse.de>
* md: expose max value of behind writes counterPaul Clements2010-05-18
| | | | | | | | | | | | | Keep track of the maximum number of concurrent write-behind requests for an md array and exposed this number in sysfs at md/bitmap/max_backlog_used Writing any value to this file will clear it. This allows userspace to be involved in tuning bitmap/backlog. Signed-off-by: Paul Clements <paul.clements@steeleye.com> Signed-off-by: NeilBrown <neilb@suse.de>
* md: remove some dead fields from mddev_sNeilBrown2010-05-18
| | | | | | | | | | | These fields have never been used. commit 4b6d287f627b5fb6a49f78f9e81649ff98c62bb7 added them, but also added identical files to bitmap_super_s, and only used the latter. So remove these unused fields. Signed-off-by: NeilBrown <neilb@suse.de>
* md/raid1: fix counting of write targets.NeilBrown2010-05-18
| | | | | | | | | | | | | | | | | | | There is a very small race window when writing to a RAID1 such that if a device is marked faulty at exactly the wrong time, the write-in-progress will not be sent to the device, but the bitmap (if present) will be updated to say that the write was sent. Then if the device turned out to still be usable as was re-added to the array, the bitmap-based-resync would skip resyncing that block, possibly leading to corruption. This would only be a problem if no further writes were issued to that area of the device (i.e. that bitmap chunk). Suitable for any pending -stable kernel. Cc: stable@kernel.org Signed-off-by: NeilBrown <neilb@suse.de>
* md: manage redundancy group in sysfs when changing level.NeilBrown2010-05-17
| | | | | | | | | | | | | | | | | | | | | Some levels expect the 'redundancy group' to be present, others don't. So when we change level of an array we might need to add or remove this group. This requires fixing up the current practice of overloading ->private to indicate (when ->pers == NULL) that something needs to be removed. So create a new ->to_remove to fill that role. When changing levels, we may need to add or remove attributes. When changing RAID5 -> RAID6, we both add and remove the same thing. It is important to catch this and optimise it out as the removal is delayed until a lock is released, so trying to add immediately would cause problems. Cc: stable@kernel.org Signed-off-by: NeilBrown <neilb@suse.de>
* md: remove unneeded sysfs files more promptlyNeilBrown2010-05-17
| | | | | | | | | | | | | | | | | | | | | | | | When an array is stopped we need to remove some sysfs files which are dependent on the type of array. We need to delay that deletion as deleting them while holding reconfig_mutex can lead to deadlocks. We currently delay them until the array is completely destroyed. However it is possible to deactivate and then reactivate the array. It is also possible to need to remove sysfs files when changing level, which can potentially happen several times before an array is destroyed. So we need to delete these files more promptly: as soon as reconfig_mutex is dropped. We need to ensure this happens before do_md_run can restart the array, so we use open_mutex for some extra locking. This is not deadlock prone. Cc: stable@kernel.org Signed-off-by: NeilBrown <neilb@suse.de>
* md/linear: avoid possible oops and array stopNeilBrown2010-05-17
| | | | | | | | | | | | | | Since commit ef286f6fa673cd7fb367e1b145069d8dbfcc6081 it has been important that each personality clears ->private in the ->stop() function, or sets it to a attribute group to be removed. linear.c doesn't. This can sometimes lead to an oops, though it doesn't always. Suitable for 2.6.33-stable and 2.6.34. Signed-off-by: NeilBrown <neilb@suse.de> Cc: stable@kernel.org
* md: set mddev readonly flag on blkdev BLKROSET ioctlDan Williams2010-05-11
| | | | | | | | | | | | | | | When the user sets the block device to readwrite then the mddev should follow suit. Otherwise, the BUG_ON in md_write_start() will be set to trigger. The reverse direction, setting mddev->ro to match a set readonly request, can be ignored because the blkdev level readonly flag precludes the need to have mddev->ro set correctly. Nevermind the fact that setting mddev->ro to 1 may fail if the array is in use. Cc: <stable@kernel.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* md: deal with merge_bvec_fn in component devices better.NeilBrown2010-03-16
| | | | | | | | | | | | | | | | | If a component device has a merge_bvec_fn then as we never call it we must ensure we never need to. Currently this is done by setting max_sector to 1 PAGE, however this does not stop a bio being created with several sub-page iovecs that would violate the merge_bvec_fn. So instead set max_segments to 1 and set the segment boundary to the same as a page boundary to ensure there is only ever one single-page segment of IO requested at a time. This can particularly be an issue when 'xen' is used as it is known to submit multiple small buffers in a single bio. Signed-off-by: NeilBrown <neilb@suse.de> Cc: stable@kernel.org
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/amit/virtio-consoleLinus Torvalds2010-03-07
|\ | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/amit/virtio-console: virtio: console: Use better variable names for fill_queue operation virtio: console: Fix type of 'len' as unsigned int
| * virtio: console: Use better variable names for fill_queue operationAmit Shah2010-03-04
| | | | | | | | | | | | | | | | | | | | | | | | We want to keep track of the number of buffers added to a vq. Use nr_added_bufs instead of 'ret'. Also, the users of fill_queue() overloaded a local 'err' variable to check the numbers of buffers allocated. Use nr_added_bufs instead of err. Signed-off-by: Amit Shah <amit.shah@redhat.com> Reported-by: Juan Quintela <quintela@redhat.com>
| * virtio: console: Fix type of 'len' as unsigned intAmit Shah2010-03-04
| | | | | | | | | | | | | | | | We declare 'len' as int type but it should be 'unsigned int', as get_buf() wants it to be. Signed-off-by: Amit Shah <amit.shah@redhat.com> Reported-by: Juan Quintela <quintela@redhat.com>
* | Merge branch 'x86-mrst-for-linus' of ↵Linus Torvalds2010-03-07
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mrst-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (30 commits) x86, mrst: Fix whitespace breakage in apb_timer.c x86, mrst: Fix APB timer per cpu clockevent x86, mrst: Remove X86_MRST dependency on PCI_IOAPIC x86, olpc: Use pci subarch init for OLPC x86, pci: Add arch_init to x86_init abstraction x86, mrst: Add Kconfig dependencies for Moorestown x86, pci: Exclude Moorestown PCI code if CONFIG_X86_MRST=n x86, numaq: Make CONFIG_X86_NUMAQ depend on CONFIG_PCI x86, pci: Add sanity check for PCI fixed bar probing x86, legacy_irq: Remove duplicate vector assigment x86, legacy_irq: Remove left over nr_legacy_irqs x86, mrst: Platform clock setup code x86, apbt: Moorestown APB system timer driver x86, mrst: Add vrtc platform data setup code x86, mrst: Add platform timer info parsing code x86, mrst: Fill in PCI functions in x86_init layer x86, mrst: Add dummy legacy pic to platform setup x86/PCI: Moorestown PCI support x86, ioapic: Add dummy ioapic functions x86, ioapic: Early enable ioapic for timer irq ... Fixed up semantic conflict of new clocksources due to commit 17622339af25 ("clocksource: add argument to resume callback").
| * | x86, mrst: Fix whitespace breakage in apb_timer.cH. Peter Anvin2010-03-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Checkin bb24c4716185f6e116c440462c65c1f56649183b: "Moorestown APB system timer driver" suffered from severe whitespace damage in arch/x86/kernel/apb_timer.c due to using Microsoft Lookout to send a patch. Fix the whitespace breakage. Reported-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * | x86, mrst: Fix APB timer per cpu clockeventJacob Pan2010-03-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current APB timer code incorrectly registers a static copy of the clockevent device for the boot CPU. The per cpu clockevent should be used instead. This bug was hidden by zero-initialized data; as such it did not get exposed in testing, but was discovered by code review. Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> LKML-Reference: <1267592494-7723-1-git-send-email-jacob.jun.pan@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * | x86, mrst: Remove X86_MRST dependency on PCI_IOAPICJacob Pan2010-03-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PCI_IOAPIC is used for PCI hotplug, Moorestown does not have ACPI PCI hotplug, as it does not have ACPI. This unnecessary dependency causes X86_MRST fail to be selected if ACPI is not selected. Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> LKML-Reference: <1267550368-7435-1-git-send-email-jacob.jun.pan@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * | x86, olpc: Use pci subarch init for OLPCThomas Gleixner2010-02-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace the #ifdef'ed OLPC-specific init functions by a conditional x86_init function. If the function returns 0 we leave pci_arch_init, otherwise we continue. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: Andres Salomon <dilinger@collabora.co.uk> LKML-Reference: <43F901BD926A4E43B106BF17856F0755A318CE89@orsmsx508.amr.corp.intel.com> Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * | x86, pci: Add arch_init to x86_init abstractionThomas Gleixner2010-02-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | Added an abstraction function for arch specific init calls. Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> LKML-Reference: <43F901BD926A4E43B106BF17856F0755A318CE84@orsmsx508.amr.corp.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * | x86, mrst: Add Kconfig dependencies for MoorestownJacob Pan2010-02-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Moorestown platform requires IOAPIC for all interrupts from the south complex, since there is no legacy PIC. Furthermore, Moorestown I/O requires PCI. Moorestown PCI depends on PCI MMCONFIG and DIRECT method to perform device enumeration, as there is no PCI BIOS. [ hpa: rewrote commit message ] Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> LKML-Reference: <1267120934-9505-1-git-send-email-jacob.jun.pan@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * | x86, pci: Exclude Moorestown PCI code if CONFIG_X86_MRST=nYinghai Lu2010-02-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | If we don't have any Moorestown CPU support compiled in, we don't need the Moorestown PCI support either. Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4B858E89.7040807@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * | x86, numaq: Make CONFIG_X86_NUMAQ depend on CONFIG_PCIPan, Jacob jun2010-02-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The NUMAQ initialization sets x86_init.pci.init to pci_numaq_init, which obviously isn't defined if CONFIG_PCI isn't defined. This dependency was implicit in the past, because pci_numaq_init was invoked from arch/x86/pci/legacy.c, which itself was conditioned on CONFIG_PCI. I suspect that no NUMA-Q machines without PCI were ever built, so instead of complicating the code by adding #ifdefs or stub functions, just disable this bit of the configuration space. [ hpa: rewrote the checkin comment ] Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com> LKML-Reference: <43F901BD926A4E43B106BF17856F0755A321EE1F@orsmsx508.amr.corp.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * | x86, pci: Add sanity check for PCI fixed bar probingJacob Pan2010-02-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While probing for the PCI fixed BAR capability in the extended PCI configuration space we need to make sure raw_pci_ext_ops is actually initialized. Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com> LKML-Reference: <43F901BD926A4E43B106BF17856F0755A321E8F7@orsmsx508.amr.corp.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * | x86, legacy_irq: Remove duplicate vector assigmentYinghai Lu2010-02-24
| | | | | | | | | | | | | | | | | | | | | | | | Remove duplicated cfg[i].vector assignment. Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4B8493A0.6080501@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * | x86, legacy_irq: Remove left over nr_legacy_irqsYinghai Lu2010-02-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | nr_legacy_irqs and its ilk have moved to legacy_pic. -v2: there is one in ioapic_.c Singed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4B84AAC4.2020204@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * | x86, mrst: Platform clock setup codeJacob Pan2010-02-24
| | | | | | | | | | | | | | | | | | | | | | | | Add Moorestown platform clock setup code to the x86_init abstraction. Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com> LKML-Reference: <43F901BD926A4E43B106BF17856F0755A318D2D4@orsmsx508.amr.corp.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * | x86, apbt: Moorestown APB system timer driverJacob Pan2010-02-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Moorestown platform does not have PIT or HPET platform timers. Instead it has a bank of eight APB timers. The number of available timers to the os is exposed via SFI mtmr tables. All APB timer interrupts are routed via ioapic rtes and delivered as MSI. Currently, we use timer 0 and 1 for per cpu clockevent devices, timer 2 for clocksource. Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com> LKML-Reference: <43F901BD926A4E43B106BF17856F0755A318D2D2@orsmsx508.amr.corp.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * | x86, mrst: Add vrtc platform data setup codeFeng Tang2010-02-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vRTC information is obtained from SFI tables on Moorestown, this patch parses these tables and assign the information. Signed-off-by: Feng Tang <feng.tang@intel.com> LKML-Reference: <43F901BD926A4E43B106BF17856F07559FB80D0D@orsmsx508.amr.corp.intel.com> Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * | x86, mrst: Add platform timer info parsing codeJacob Pan2010-02-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | Moorestown platform timer information is obtained from SFI FW tables. This patch parses SFI table then assign the irq information to mp_irqs. Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com> LKML-Reference: <43F901BD926A4E43B106BF17856F07559FB80D0B@orsmsx508.amr.corp.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>