aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAge
* [PATCH] optional ZONE_DMA: deal with cases of ZONE_DMA meaning the first zoneChristoph Lameter2007-02-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patchset follows up on the earlier work in Andrew's tree to reduce the number of zones. The patches allow to go to a minimum of 2 zones. This one allows also to make ZONE_DMA optional and therefore the number of zones can be reduced to one. ZONE_DMA is usually used for ISA DMA devices. There are a number of reasons why we would not want to have ZONE_DMA 1. Some arches do not need ZONE_DMA at all. 2. With the advent of IOMMUs DMA zones are no longer needed. The necessity of DMA zones may drastically be reduced in the future. This patchset allows a compilation of a kernel without that overhead. 3. Devices that require ISA DMA get rare these days. All my systems do not have any need for ISA DMA. 4. The presence of an additional zone unecessarily complicates VM operations because it must be scanned and balancing logic must operate on its. 5. With only ZONE_NORMAL one can reach the situation where we have only one zone. This will allow the unrolling of many loops in the VM and allows the optimization of varous code paths in the VM. 6. Having only a single zone in a NUMA system results in a 1-1 correspondence between nodes and zones. Various additional optimizations to critical VM paths become possible. Many systems today can operate just fine with a single zone. If you look at what is in ZONE_DMA then one usually sees that nothing uses it. The DMA slabs are empty (Some arches use ZONE_DMA instead of ZONE_NORMAL, then ZONE_NORMAL will be empty instead). On all of my systems (i386, x86_64, ia64) ZONE_DMA is completely empty. Why constantly look at an empty zone in /proc/zoneinfo and empty slab in /proc/slabinfo? Non i386 also frequently have no need for ZONE_DMA and zones stay empty. The patchset was tested on i386 (UP / SMP), x86_64 (UP, NUMA) and ia64 (NUMA). The RFC posted earlier (see http://marc.theaimsgroup.com/?l=linux-kernel&m=115231723513008&w=2) had lots of #ifdefs in them. An effort has been made to minize the number of #ifdefs and make this as compact as possible. The job was made much easier by the ongoing efforts of others to extract common arch specific functionality. I have been running this for awhile now on my desktop and finally Linux is using all my available RAM instead of leaving the 16MB in ZONE_DMA untouched: christoph@pentium940:~$ cat /proc/zoneinfo Node 0, zone Normal pages free 4435 min 1448 low 1810 high 2172 active 241786 inactive 210170 scanned 0 (a: 0 i: 0) spanned 524224 present 524224 nr_anon_pages 61680 nr_mapped 14271 nr_file_pages 390264 nr_slab_reclaimable 27564 nr_slab_unreclaimable 1793 nr_page_table_pages 449 nr_dirty 39 nr_writeback 0 nr_unstable 0 nr_bounce 0 cpu: 0 pcp: 0 count: 156 high: 186 batch: 31 cpu: 0 pcp: 1 count: 9 high: 62 batch: 15 vm stats threshold: 20 cpu: 1 pcp: 0 count: 177 high: 186 batch: 31 cpu: 1 pcp: 1 count: 12 high: 62 batch: 15 vm stats threshold: 20 all_unreclaimable: 0 prev_priority: 12 temp_priority: 12 start_pfn: 0 This patch: In two places in the VM we use ZONE_DMA to refer to the first zone. If ZONE_DMA is optional then other zones may be first. So simply replace ZONE_DMA with zone 0. This also fixes ZONETABLE_PGSHIFT. If we have only a single zone then ZONES_PGSHIFT may become 0 because there is no need anymore to encode the zone number related to a pgdat. However, we still need a zonetable to index all the zones for each node if this is a NUMA system. Therefore define ZONETABLE_SHIFT unconditionally as the offset of the ZONE field in page flags. [apw@shadowen.org: fix mismerge] Acked-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Andi Kleen <ak@suse.de> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Matthew Wilcox <willy@debian.org> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [PATCH] Drop get_zone_counts()Christoph Lameter2007-02-11
| | | | | | | | Values are available via ZVC sums. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [PATCH] Drop __get_zone_counts()Christoph Lameter2007-02-11
| | | | | | | | Values are readily available via ZVC per node and global sums. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [PATCH] Drop nr_free_pages_pgdat()Christoph Lameter2007-02-11
| | | | | | | | | Function is unnecessary now. We can use the summing features of the ZVCs to get the values we need. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [PATCH] Drop free_pages()Christoph Lameter2007-02-11
| | | | | | | | | | | | | | | nr_free_pages is now a simple access to a global variable. Make it a macro instead of a function. The nr_free_pages now requires vmstat.h to be included. There is one occurrence in power management where we need to add the include. Directly refrer to global_page_state() there to clarify why the #include was added. [akpm@osdl.org: arm build fix] [akpm@osdl.org: sparc64 build fix] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [PATCH] Reorder ZVCs according to cachelineChristoph Lameter2007-02-11
| | | | | | | | | | | | The global and per zone counter sums are in arrays of longs. Reorder the ZVCs so that the most frequently used ZVCs are put into the same cacheline. That way calculations of the global, node and per zone vm state touches only a single cacheline. This is mostly important for 64 bit systems were one 128 byte cacheline takes only 8 longs. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [PATCH] Use ZVC for free_pagesChristoph Lameter2007-02-11
| | | | | | | | | | | This is again simplifies some of the VM counter calculations through the use of the ZVC consolidated counters. [michal.k.k.piotrowski@gmail.com: build fix] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Michal Piotrowski <michal.k.k.piotrowski@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [PATCH] Use ZVC for inactive and active countsChristoph Lameter2007-02-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The determination of the dirty ratio to determine writeback behavior is currently based on the number of total pages on the system. However, not all pages in the system may be dirtied. Thus the ratio is always too low and can never reach 100%. The ratio may be particularly skewed if large hugepage allocations, slab allocations or device driver buffers make large sections of memory not available anymore. In that case we may get into a situation in which f.e. the background writeback ratio of 40% cannot be reached anymore which leads to undesired writeback behavior. This patchset fixes that issue by determining the ratio based on the actual pages that may potentially be dirty. These are the pages on the active and the inactive list plus free pages. The problem with those counts has so far been that it is expensive to calculate these because counts from multiple nodes and multiple zones will have to be summed up. This patchset makes these counters ZVC counters. This means that a current sum per zone, per node and for the whole system is always available via global variables and not expensive anymore to calculate. The patchset results in some other good side effects: - Removal of the various functions that sum up free, active and inactive page counts - Cleanup of the functions that display information via the proc filesystem. This patch: The use of a ZVC for nr_inactive and nr_active allows a simplification of some counter operations. More ZVC functionality is used for sums etc in the following patches. [akpm@osdl.org: UP build fix] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [PATCH] page_mkwrite caller race fixHugh Dickins2007-02-11
| | | | | | | | | | | | After do_wp_page has tested page_mkwrite, it must release old_page after acquiring page table lock, not before: at some stage that ordering got reversed, leaving a (very unlikely) window in which old_page might be truncated, freed, and reused in the same position. Signed-off-by: Hugh Dickins <hugh@veritas.com> Acked-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [PATCH] typeof __page_to_pfn with SPARSEMEM=yRandy Dunlap2007-02-11
| | | | | | | | | | | | With CONFIG_SPARSEMEM=y: mm/rmap.c:579: warning: format '%lx' expects type 'long unsigned int', but argument 2 has type 'int' Make __page_to_pfn() return unsigned long. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [PATCH] /proc/zoneinfo: fix vm stats displayAndrew Morton2007-02-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This early break prevents us from displaying info for the vm stats thresholds if the zone doesn't have any pages in its per-cpu pagesets. So my 800MB i386 box says: Node 0, zone DMA pages free 2365 min 16 low 20 high 24 active 0 inactive 0 scanned 0 (a: 0 i: 0) spanned 4096 present 4044 nr_anon_pages 0 nr_mapped 1 nr_file_pages 0 nr_slab_reclaimable 0 nr_slab_unreclaimable 0 nr_page_table_pages 0 nr_dirty 0 nr_writeback 0 nr_unstable 0 nr_bounce 0 nr_vmscan_write 0 protection: (0, 868, 868) pagesets all_unreclaimable: 0 prev_priority: 12 start_pfn: 0 Node 0, zone Normal pages free 199713 min 934 low 1167 high 1401 active 10215 inactive 4507 scanned 0 (a: 0 i: 0) spanned 225280 present 222420 nr_anon_pages 2685 nr_mapped 1110 nr_file_pages 12055 nr_slab_reclaimable 2216 nr_slab_unreclaimable 1527 nr_page_table_pages 213 nr_dirty 0 nr_writeback 0 nr_unstable 0 nr_bounce 0 nr_vmscan_write 0 protection: (0, 0, 0) pagesets cpu: 0 pcp: 0 count: 152 high: 186 batch: 31 cpu: 0 pcp: 1 count: 13 high: 62 batch: 15 vm stats threshold: 16 cpu: 1 pcp: 0 count: 34 high: 186 batch: 31 cpu: 1 pcp: 1 count: 10 high: 62 batch: 15 vm stats threshold: 16 all_unreclaimable: 0 prev_priority: 12 start_pfn: 4096 Just nuke all that search-for-the-first-non-empty-pageset code. Dunno why it was there in the first place.. Cc: Christoph Lameter <clameter@engr.sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [PATCH] Avoid excessive sorting of early_node_map[]Mel Gorman2007-02-11
| | | | | | | | | | | | | | find_min_pfn_for_node() and find_min_pfn_with_active_regions() sort early_node_map[] on every call. This is an excessive amount of sorting and that can be avoided. This patch always searches the whole early_node_map[] in find_min_pfn_for_node() instead of returning the first value found. The map is then only sorted once when required. Successfully boot tested on a number of machines. [akpm@osdl.org: cleanup] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [PATCH] Remove final references to deprecated "MAP_ANON" page protection flagRobert P. J. Day2007-02-11
| | | | | | | | | Remove the last vestiges of the long-deprecated "MAP_ANON" page protection flag: use "MAP_ANONYMOUS" instead. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [PATCH] slab: use parameter passed to cache_reap to determine pointer to ↵Christoph Lameter2007-02-11
| | | | | | | | | | | work structure Use the pointer passed to cache_reap to determine the work pointer and consolidate exit paths. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [PATCH] slab: cache alloc cleanupsPekka Enberg2007-02-11
| | | | | | | | | | | | | | | | | | | Clean up __cache_alloc and __cache_alloc_node functions a bit. We no longer need to do NUMA_BUILD tricks and the UMA allocation path is much simpler. No functional changes in this patch. Note: saves few kernel text bytes on x86 NUMA build due to using gotos in __cache_alloc_node() and moving __GFP_THISNODE check in to fallback_alloc(). Cc: Andy Whitcroft <apw@shadowen.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Manfred Spraul <manfred@colorfullife.com> Acked-by: Christoph Lameter <christoph@lameter.com> Cc: Paul Jackson <pj@sgi.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [PATCH] slab: remove broken PageSlab check from kfree_debugcheckPekka Enberg2007-02-11
| | | | | | | | | | | The PageSlab debug check in kfree_debugcheck() is broken for compound pages. It is also redundant as we already do BUG_ON for non-slab pages in page_get_cache() and page_get_slab() which are always called before we free any actual objects. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* libata: kill ATA_ENABLE_PATAJeff Garzik2007-02-09
| | | | | | | The ATA_ENABLE_PATA define was never meant to be permanent, and in recent kernels, it's already been unconditionally enabled. Remove. Signed-off-by: Jeff Garzik <jeff@garzik.org>
* libata: build fix after dmesg probe output changesJeff Garzik2007-02-09
| | | | Signed-off-by: Jeff Garzik <jeff@garzik.org>
* libata: Early CFA adapters are not required to support mode settingAlan2007-02-09
| | | | | | | | | | | | If we are doing a PIO setup for a CFA card and it blows up with a device error then assume it is an older CFA card which doesn't support this rather than failing the device out of existance. Stands seperate to the quieting patch but that is obviously useful with this change. Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
* sata_nv: propagate ata_pci_device_do_resume return valueRobert Hancock2007-02-09
| | | | | | | | | | ata_pci_device_do_resume can fail if the PCI device couldn't be re-enabled. Update sata_nv to propagate the return value from this call and to not try to do any other resume activities if it fails. Fixes a compile warning. Signed-off-by: Robert Hancock <hancockr@shaw.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
* sata_nv: wait for response on entering/leaving ADMA modeRobert Hancock2007-02-09
| | | | | | | | | | | | | | | Update sata_nv to wait for the controller to indicate via the status register that it has entered the requested state when switching between ADMA mode and register mode. This issue came up recently when debugging some problems with cache flush command timeouts and while it didn't appear to fix that problem, this is something we should likely be doing in any case. Signed-off-by: Robert Hancock <hancockr@shaw.ca> Cc: Tejun Heo <htejun@gmail.com> Cc: Jeff Garzik <jeff@garzik.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
* sata_nv: use ADMA for NODATA commandsRobert Hancock2007-02-09
| | | | | | | | | | | | | | | | | | | | | | | Some problems showed up recently with cache flush commands timing out on sata_nv. Previously these commands were always handled by transitioning to legacy mode from ADMA mode first. The timeout problem was worked around already by a change to the interrupt handling code for legacy mode, but for non-data commands like these it appears we can handle them in ADMA mode, so the switch to legacy mode is not needed. This patch changes the behavior so that we use ADMA mode to submit interrupt-driven commands with ATA_PROT_NODATA protocol. In addition to avoiding the problem mentioned above entirely, this avoids the overhead of switching to legacy mode and back to ADMA mode for handling cache flushes. When handling non-DMA-mapped commands, we leave the APRD blank and clear the NV_CPB_CTL_APRD_VALID field in the CPB so the controller does not attempt to read it. Signed-off-by: Robert Hancock <hancockr@shaw.ca> Cc: Jeff Garzik <jeff@garzik.org> Cc: Tejun Heo <htejun@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
* sata_nv: cleanup ADMA error handlingRobert Hancock2007-02-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This cleans up a few issues with the error handling in sata_nv in ADMA mode to make it more consistent with other NCQ-capable drivers like ahci and sata_sil24: - When a command failed, we would effectively set AC_ERR_DEV on the queued command always. In the case of NCQ commands this prevents libata from doing a log page query to determine the details of the failed command, since it thinks we've already analyzed. Just set flags in the port ehi->err_mask, then freeze or abort and let libata figure out what went wrong. - The code handled NV_ADMA_STAT_CPBERR as a "really bad error" which caused it to set error flags on every queued command. I don't know exactly what this flag means (no docs, grr!) but from what I can guess from the standard ADMA spec, it just means that one or more of the CPBs had an error, so we just need to go through and do our normal checks in this case. - In the error_handler function the code would always dump the state of all the CPBs. This output seems redundant at this point since libata already dumps the state of all active commands on errors (and it also triggers at times when it shouldn't, like when suspending). Take this out. [akpm@osdl.org: many coding-style fixes] Signed-off-by: Robert Hancock <hancockr@shaw.ca> Cc: Jeff Garzik <jeff@garzik.org> Cc: Tejun Heo <htejun@gmail.com> Cc: Allen Martin <AMartin@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
* (2.6.20) pata_mpiix: probing cleanup (resend)Sergei Shtylyov2007-02-09
| | | | | | | | | | | | MPIIX has only single channel IDE which can be configured for either primary or secondary legacy I/O ports and IRQ. So, get rid of the unneeded second probe entry in mpiix_init_one() and of the invalid (but unused anyway) enable bits in mpiix_pre_reset(). Warning: this cleanup has only been compile-tested... Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
* (2.6.20) pata_mpiix: fix PIO setup issuesSergei Shtylyov2007-02-09
| | | | | | | | | | Fix clearing/setting the wrong TIME/IE/PPE bits for a slave drive caused by a wrong shift count. Fix the PIO mode 1 being overclocked by wrongly selecting the fast timing bank. Also, fix/rephrase some comments while at it. Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
* (2.6.20) pata_oldpiix: fix PIO2 underclockingSergei Shtylyov2007-02-09
| | | | | | | | | Fix the PIO mode 2 using mode 0 timings -- this driver should enable the fast timing bank starting with PIO2, just like the ata_piix driver does. Also, fix/rephrase some comments while at it. Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
* ata: Add defines for the iordy bitsAlan2007-02-09
| | | | | | | IORDY and IORDY enable/disable flags. Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
* Kconfig: clarify ATA_PIIX descriptionAlan2007-02-09
| | | | | | | | People are getting confused about which drivers to enable for PATA PIIX type devices. Change the ATA_PIIX line and help to make it clearer. Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
* sata_inic162x: fix a few glitches in hardresetTejun Heo2007-02-09
| | | | | | | | | | | | | * Hardreset must not exit without actually performing reset regardless of link status. We're resetting the link after all. * Minor message update. * 150ms delay is meaningful iff link is online after reset is complete. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
* libata: add 150ms between completion of hardreset and status checkingTejun Heo2007-02-09
| | | | | | | | | | | | | | Follow the old SRST rule and delay 150ms between completion of hardreset and status checking. Debouncing delay should usually cover this but debounce duration could be shorter than 150ms under certain circumstances. Usefulness depends on host controller implementation but it can't hurt and serves as a reminder that 2s delay for GoVault should also be added here. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
* libata: rearrange dmesg info to add full ATA revisionEric D. Mudama2007-02-09
| | | | | | | | | Per Jeff's suggestion, this patch rearranges the info printed for ATA drives into dmesg to add the full ATA firmware revision and model information, while keeping the output to 2 lines. Signed-off-by: Eric D. Mudama <edmudama@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
* pata_sl82c105: wrong assumptions about compatible PIO modesSergei Shtylyov2007-02-09
| | | | | | | Fix the wrong "compatible" PIO mode choices: MWDMA0 has 480 ns cycle while PIO1 only has 383 ns cycle, and MWDMA2 timings matchs those of PIO4 exactly. Signed-off-by: Jeff Garzik <jeff@garzik.org>
* libata: add another IRQ calls (libata drivers)Akira Iguchi2007-02-09
| | | | | | | | | | | | | | | | | | This patch is against each libata driver. Two IRQ calls are added in ata_port_operations. - irq_on() is used to enable interrupts. - irq_ack() is used to acknowledge a device interrupt. In most drivers, ata_irq_on() and ata_irq_ack() are used for irq_on and irq_ack respectively. In some drivers (ex: ahci, sata_sil24) which cannot use them as is, ata_dummy_irq_on() and ata_dummy_irq_ack() are used. Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp> Signed-off-by: Akira Iguchi <akira2.iguchi@toshiba.co.jp> Signed-off-by: Jeff Garzik <jeff@garzik.org>
* libata: add another IRQ calls (core and headers)Akira Iguchi2007-02-09
| | | | | | | | | | | | | | | | | | This patch is against the libata core and headers. Two IRQ calls are added in ata_port_operations. - irq_on() is used to enable interrupts. - irq_ack() is used to acknowledge a device interrupt. In most drivers, ata_irq_on() and ata_irq_ack() are used for irq_on and irq_ack respectively. In some drivers (ex: ahci, sata_sil24) which cannot use them as is, ata_dummy_irq_on() and ata_dummy_irq_ack() are used. Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp> Signed-off-by: Akira Iguchi <akira2.iguchi@toshiba.co.jp> Signed-off-by: Jeff Garzik <jeff@garzik.org>
* git-libata-all: forward declare struct deviceAndrew Morton2007-02-09
| | | | | | | | | | In file included from drivers/infiniband/hw/ipath/ipath_diag.c:44: include/linux/io.h:35: warning: 'struct device' declared inside parameter list include/linux/io.h:35: warning: its scope is only this definition or declaration Cc: Jeff Garzik <jeff@garzik.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
* libata: convert to iomapTejun Heo2007-02-09
| | | | | | | | | | | | | | | | Convert libata core layer and LLDs to use iomap. * managed iomap is used. Pointer to pcim_iomap_table() is cached at host->iomap and used through out LLDs. This basically replaces host->mmio_base. * if possible, pcim_iomap_regions() is used Most iomap operation conversions are taken from Jeff Garzik <jgarzik@pobox.com>'s iomap branch. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
* pata_platform: fix devres conversionTejun Heo2007-02-09
| | | | | | | | devres updates for pata_platform were dropped while merging devres patches due to merge conflict. This is the updated version. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
* iomap: iomap should be in obj-y not in lib-yTejun Heo2007-02-09
| | | | | | | | devres change moved iomap.o from obj-$(CONFIG_GENERIC_IOMAP) to lib-y making it not linked if no in-kernel driver uses it. Fix it. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
* [libata] Shuffle DRV_xxx in core and SiS drivers, to kill warningsJeff Garzik2007-02-09
| | | | Signed-off-by: Jeff Garzik <jeff@garzik.org>
* devres: implement pcim_iomap_regions()Tejun Heo2007-02-09
| | | | | | | | | Implement pcim_iomap_regions(). This function takes mask of BARs to request and iomap. No BAR should have length of zero. BARs are iomapped using pcim_iomap_table(). Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
* libata: remove unused functionsTejun Heo2007-02-09
| | | | | | | | Now that all LLDs are converted to use devres, default stop callbacks are unused. Remove them. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
* libata: update libata LLDs to use devresTejun Heo2007-02-09
| | | | | | | | | | | | | | | | | Update libata LLDs to use devres. Core layer is already converted to support managed LLDs. This patch simplifies initialization and fixes many resource related bugs in init failure and detach path. For example, all converted drivers now handle ata_device_add() failure gracefully without excessive resource rollback code. As most resources are released automatically on driver detach, many drivers don't need or can do with much simpler ->{port|host}_stop(). In general, stop callbacks are need iff port or host needs to be given commands to shut it down. Note that freezing is enough in many cases and ports are automatically frozen before being detached. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
* libata: update libata core layer to use devresTejun Heo2007-02-09
| | | | | | | | | | | | | | | | | | | | | | Update libata core layer to use devres. * ata_device_add() acquires all resources in managed mode. * ata_host is allocated as devres associated with ata_host_release. * Port attached status is handled as devres associated with ata_host_attach_release(). * Initialization failure and host removal is handedl by releasing devres group. * Except for ata_scsi_release() removal, LLD interface remains the same. Some functions use hacky is_managed test to support both managed and unmanaged devices. These will go away once all LLDs are updated to use devres. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
* libata: implement ata_host_detach()Tejun Heo2007-02-09
| | | | | | | | | | Implement ata_host_detach() which calls ata_port_detach() for each port in the host and export it. ata_port_detach() is now internal and thus un-exported. ata_host_detach() will be used as the 'deregister from libata layer' function after devres conversion. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
* devres: device resource managementTejun Heo2007-02-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement device resource management, in short, devres. A device driver can allocate arbirary size of devres data which is associated with a release function. On driver detach, release function is invoked on the devres data, then, devres data is freed. devreses are typed by associated release functions. Some devreses are better represented by single instance of the type while others need multiple instances sharing the same release function. Both usages are supported. devreses can be grouped using devres group such that a device driver can easily release acquired resources halfway through initialization or selectively release resources (e.g. resources for port 1 out of 4 ports). This patch adds devres core including documentation and the following managed interfaces. * alloc/free : devm_kzalloc(), devm_kzfree() * IO region : devm_request_region(), devm_release_region() * IRQ : devm_request_irq(), devm_free_irq() * DMA : dmam_alloc_coherent(), dmam_free_coherent(), dmam_declare_coherent_memory(), dmam_pool_create(), dmam_pool_destroy() * PCI : pcim_enable_device(), pcim_pin_device(), pci_is_managed() * iomap : devm_ioport_map(), devm_ioport_unmap(), devm_ioremap(), devm_ioremap_nocache(), devm_iounmap(), pcim_iomap_table(), pcim_iomap(), pcim_iounmap() Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
* fix CONFIG_SATA_SIS=y compile errorAdrian Bunk2007-02-09
| | | | | | | | | | | | | | | | | Static code shouldn't be used from other modules. drivers/built-in.o: In function `sis_init_one': sata_sis.c:(.text+0x7634cd): undefined reference to `sis_info133' sata_sis.c:(.text+0x7634d6): undefined reference to `sis_info133' While I was at it, I also moved the prototype of this struct to a header file. Signed-off-by: Adrian Bunk <bunk@stusta.de> Cc: Jeff Garzik <jeff@garzik.org> Cc: Tejun Heo <htejun@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
* sata_sis: Support for PATA supportsAlan2007-02-09
| | | | | | | | | | | | This is quick rework of the patch Uwe proposed but using Kconfig not ifdefs and user selection to sort out PATA support. Instead of ifdefs and requiring the user to select both drivers the SATA driver selects the PATA one. For neatness I've also moved the extern into the function that uses it. Signed-off-by: Alan Cox Signed-off-by: Jeff Garzik <jeff@garzik.org>
* libata: implement HDIO_GET_IDENTITYTejun Heo2007-02-09
| | | | | | | | | 'hdparm -I' doesn't work with ATAPI devices and sg_sat is not widely spread yet leaving no easy way to access ATAPI IDENTIFY data. Implement HDIO_GET_IDENTITY such that at least 'hdparm -i' works. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
* libata: trivial stuffAlan2007-02-09
| | | | | | | Readability/typos etc Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
* sata_promise: kill qc->nsectTejun Heo2007-02-09
| | | | | | | Merge order left qc->nsect usage in sata_promise dangling. Kill it. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>