From d4ec36bac3de39b7e10ec8f42fbdd20d9a9ed753 Mon Sep 17 00:00:00 2001 From: Wolfram Sang Date: Sun, 21 Jun 2009 12:32:39 +0200 Subject: sched: Documentation/sched-rt-group: Fix style issues & bump version - add missing closing bracket - fix two 80-chars issues (as the rest of the document adheres to it) - bump a reference to kernel version, so the document does not feel aged Signed-off-by: Wolfram Sang Cc: Peter Zijlstra LKML-Reference: <1245580359-4465-1-git-send-email-w.sang@pengutronix.de> Signed-off-by: Ingo Molnar --- Documentation/scheduler/sched-rt-group.txt | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) (limited to 'Documentation') diff --git a/Documentation/scheduler/sched-rt-group.txt b/Documentation/scheduler/sched-rt-group.txt index 1df7f9cdab05..86eabe6c3419 100644 --- a/Documentation/scheduler/sched-rt-group.txt +++ b/Documentation/scheduler/sched-rt-group.txt @@ -73,7 +73,7 @@ The remaining CPU time will be used for user input and other tasks. Because realtime tasks have explicitly allocated the CPU time they need to perform their tasks, buffer underruns in the graphics or audio can be eliminated. -NOTE: the above example is not fully implemented as of yet (2.6.25). We still +NOTE: the above example is not fully implemented yet. We still lack an EDF scheduler to make non-uniform periods usable. @@ -140,14 +140,15 @@ The other option is: .o CONFIG_CGROUP_SCHED (aka "Basis for grouping tasks" = "Control groups") -This uses the /cgroup virtual file system and "/cgroup//cpu.rt_runtime_us" -to control the CPU time reserved for each control group instead. +This uses the /cgroup virtual file system and +"/cgroup//cpu.rt_runtime_us" to control the CPU time reserved for each +control group instead. For more information on working with control groups, you should read Documentation/cgroups/cgroups.txt as well. -Group settings are checked against the following limits in order to keep the configuration -schedulable: +Group settings are checked against the following limits in order to keep the +configuration schedulable: \Sum_{i} runtime_{i} / global_period <= global_runtime / global_period @@ -189,7 +190,7 @@ Implementing SCHED_EDF might take a while to complete. Priority Inheritance is the biggest challenge as the current linux PI infrastructure is geared towards the limited static priority levels 0-99. With deadline scheduling you need to do deadline inheritance (since priority is inversely proportional to the -deadline delta (deadline - now). +deadline delta (deadline - now)). This means the whole PI machinery will have to be reworked - and that is one of the most complex pieces of code we have. -- cgit v1.2.2 From fa8a7094ba1679b4b9b443e0ac9f5e046c79ee8d Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Mon, 22 Jun 2009 11:56:24 +0900 Subject: x86: implement percpu_alloc kernel parameter According to Andi, it isn't clear whether lpage allocator is worth the trouble as there are many processors where PMD TLB is far scarcer than PTE TLB. The advantage or disadvantage probably depends on the actual size of percpu area and specific processor. As performance degradation due to TLB pressure tends to be highly workload specific and subtle, it is difficult to decide which way to go without more data. This patch implements percpu_alloc kernel parameter to allow selecting which first chunk allocator to use to ease debugging and testing. While at it, make sure all the failure paths report why something failed to help determining why certain allocator isn't working. Also, kill the "Great future plan" comment which had already been realized quite some time ago. [ Impact: allow explicit percpu first chunk allocator selection ] Signed-off-by: Tejun Heo Reported-by: Jan Beulich Cc: Andi Kleen Cc: Ingo Molnar --- Documentation/kernel-parameters.txt | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'Documentation') diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 08def8deb5f5..ecad946920d1 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -1882,6 +1882,12 @@ and is between 256 and 4096 characters. It is defined in the file Format: { 0 | 1 } See arch/parisc/kernel/pdc_chassis.c + percpu_alloc= [X86] Select which percpu first chunk allocator to use. + Allowed values are one of "lpage", "embed" and "4k". + See comments in arch/x86/kernel/setup_percpu.c for + details on each allocator. This parameter is primarily + for debugging and performance comparison. + pf. [PARIDE] See Documentation/blockdev/paride.txt. -- cgit v1.2.2 From b053dc5a722eade28514f2cc922caf7a4baad987 Mon Sep 17 00:00:00 2001 From: Kumar Gala Date: Fri, 19 Jun 2009 08:31:05 -0500 Subject: powerpc: Refactor device tree binding Split device tree binding out of booting-without-of.txt and put them into their own files per binding. Signed-off-by: Kumar Gala --- Documentation/powerpc/booting-without-of.txt | 1168 +--------------------- Documentation/powerpc/dts-bindings/4xx/emac.txt | 148 +++ Documentation/powerpc/dts-bindings/gpio/gpio.txt | 50 + Documentation/powerpc/dts-bindings/gpio/mdio.txt | 19 + Documentation/powerpc/dts-bindings/marvell.txt | 521 ++++++++++ Documentation/powerpc/dts-bindings/phy.txt | 25 + Documentation/powerpc/dts-bindings/spi-bus.txt | 57 ++ Documentation/powerpc/dts-bindings/usb-ehci.txt | 25 + Documentation/powerpc/dts-bindings/xilinx.txt | 295 ++++++ 9 files changed, 1142 insertions(+), 1166 deletions(-) create mode 100644 Documentation/powerpc/dts-bindings/4xx/emac.txt create mode 100644 Documentation/powerpc/dts-bindings/gpio/gpio.txt create mode 100644 Documentation/powerpc/dts-bindings/gpio/mdio.txt create mode 100644 Documentation/powerpc/dts-bindings/marvell.txt create mode 100644 Documentation/powerpc/dts-bindings/phy.txt create mode 100644 Documentation/powerpc/dts-bindings/spi-bus.txt create mode 100644 Documentation/powerpc/dts-bindings/usb-ehci.txt create mode 100644 Documentation/powerpc/dts-bindings/xilinx.txt (limited to 'Documentation') diff --git a/Documentation/powerpc/booting-without-of.txt b/Documentation/powerpc/booting-without-of.txt index 8d999d862d0e..79f533f38c61 100644 --- a/Documentation/powerpc/booting-without-of.txt +++ b/Documentation/powerpc/booting-without-of.txt @@ -1238,1122 +1238,7 @@ descriptions for the SOC devices for which new nodes have been defined; this list will expand as more and more SOC-containing platforms are moved over to use the flattened-device-tree model. - a) PHY nodes - - Required properties: - - - device_type : Should be "ethernet-phy" - - interrupts : where a is the interrupt number and b is a - field that represents an encoding of the sense and level - information for the interrupt. This should be encoded based on - the information in section 2) depending on the type of interrupt - controller you have. - - interrupt-parent : the phandle for the interrupt controller that - services interrupts for this device. - - reg : The ID number for the phy, usually a small integer - - linux,phandle : phandle for this node; likely referenced by an - ethernet controller node. - - - Example: - - ethernet-phy@0 { - linux,phandle = <2452000> - interrupt-parent = <40000>; - interrupts = <35 1>; - reg = <0>; - device_type = "ethernet-phy"; - }; - - - b) Interrupt controllers - - Some SOC devices contain interrupt controllers that are different - from the standard Open PIC specification. The SOC device nodes for - these types of controllers should be specified just like a standard - OpenPIC controller. Sense and level information should be encoded - as specified in section 2) of this chapter for each device that - specifies an interrupt. - - Example : - - pic@40000 { - linux,phandle = <40000>; - interrupt-controller; - #address-cells = <0>; - reg = <40000 40000>; - compatible = "chrp,open-pic"; - device_type = "open-pic"; - }; - - c) 4xx/Axon EMAC ethernet nodes - - The EMAC ethernet controller in IBM and AMCC 4xx chips, and also - the Axon bridge. To operate this needs to interact with a ths - special McMAL DMA controller, and sometimes an RGMII or ZMII - interface. In addition to the nodes and properties described - below, the node for the OPB bus on which the EMAC sits must have a - correct clock-frequency property. - - i) The EMAC node itself - - Required properties: - - device_type : "network" - - - compatible : compatible list, contains 2 entries, first is - "ibm,emac-CHIP" where CHIP is the host ASIC (440gx, - 405gp, Axon) and second is either "ibm,emac" or - "ibm,emac4". For Axon, thus, we have: "ibm,emac-axon", - "ibm,emac4" - - interrupts : - - interrupt-parent : optional, if needed for interrupt mapping - - reg : - - local-mac-address : 6 bytes, MAC address - - mal-device : phandle of the associated McMAL node - - mal-tx-channel : 1 cell, index of the tx channel on McMAL associated - with this EMAC - - mal-rx-channel : 1 cell, index of the rx channel on McMAL associated - with this EMAC - - cell-index : 1 cell, hardware index of the EMAC cell on a given - ASIC (typically 0x0 and 0x1 for EMAC0 and EMAC1 on - each Axon chip) - - max-frame-size : 1 cell, maximum frame size supported in bytes - - rx-fifo-size : 1 cell, Rx fifo size in bytes for 10 and 100 Mb/sec - operations. - For Axon, 2048 - - tx-fifo-size : 1 cell, Tx fifo size in bytes for 10 and 100 Mb/sec - operations. - For Axon, 2048. - - fifo-entry-size : 1 cell, size of a fifo entry (used to calculate - thresholds). - For Axon, 0x00000010 - - mal-burst-size : 1 cell, MAL burst size (used to calculate thresholds) - in bytes. - For Axon, 0x00000100 (I think ...) - - phy-mode : string, mode of operations of the PHY interface. - Supported values are: "mii", "rmii", "smii", "rgmii", - "tbi", "gmii", rtbi", "sgmii". - For Axon on CAB, it is "rgmii" - - mdio-device : 1 cell, required iff using shared MDIO registers - (440EP). phandle of the EMAC to use to drive the - MDIO lines for the PHY used by this EMAC. - - zmii-device : 1 cell, required iff connected to a ZMII. phandle of - the ZMII device node - - zmii-channel : 1 cell, required iff connected to a ZMII. Which ZMII - channel or 0xffffffff if ZMII is only used for MDIO. - - rgmii-device : 1 cell, required iff connected to an RGMII. phandle - of the RGMII device node. - For Axon: phandle of plb5/plb4/opb/rgmii - - rgmii-channel : 1 cell, required iff connected to an RGMII. Which - RGMII channel is used by this EMAC. - Fox Axon: present, whatever value is appropriate for each - EMAC, that is the content of the current (bogus) "phy-port" - property. - - Optional properties: - - phy-address : 1 cell, optional, MDIO address of the PHY. If absent, - a search is performed. - - phy-map : 1 cell, optional, bitmap of addresses to probe the PHY - for, used if phy-address is absent. bit 0x00000001 is - MDIO address 0. - For Axon it can be absent, though my current driver - doesn't handle phy-address yet so for now, keep - 0x00ffffff in it. - - rx-fifo-size-gige : 1 cell, Rx fifo size in bytes for 1000 Mb/sec - operations (if absent the value is the same as - rx-fifo-size). For Axon, either absent or 2048. - - tx-fifo-size-gige : 1 cell, Tx fifo size in bytes for 1000 Mb/sec - operations (if absent the value is the same as - tx-fifo-size). For Axon, either absent or 2048. - - tah-device : 1 cell, optional. If connected to a TAH engine for - offload, phandle of the TAH device node. - - tah-channel : 1 cell, optional. If appropriate, channel used on the - TAH engine. - - Example: - - EMAC0: ethernet@40000800 { - device_type = "network"; - compatible = "ibm,emac-440gp", "ibm,emac"; - interrupt-parent = <&UIC1>; - interrupts = <1c 4 1d 4>; - reg = <40000800 70>; - local-mac-address = [00 04 AC E3 1B 1E]; - mal-device = <&MAL0>; - mal-tx-channel = <0 1>; - mal-rx-channel = <0>; - cell-index = <0>; - max-frame-size = <5dc>; - rx-fifo-size = <1000>; - tx-fifo-size = <800>; - phy-mode = "rmii"; - phy-map = <00000001>; - zmii-device = <&ZMII0>; - zmii-channel = <0>; - }; - - ii) McMAL node - - Required properties: - - device_type : "dma-controller" - - compatible : compatible list, containing 2 entries, first is - "ibm,mcmal-CHIP" where CHIP is the host ASIC (like - emac) and the second is either "ibm,mcmal" or - "ibm,mcmal2". - For Axon, "ibm,mcmal-axon","ibm,mcmal2" - - interrupts : . - For Axon: This is _different_ from the current - firmware. We use the "delayed" interrupts for txeob - and rxeob. Thus we end up with mapping those 5 MPIC - interrupts, all level positive sensitive: 10, 11, 32, - 33, 34 (in decimal) - - dcr-reg : < DCR registers range > - - dcr-parent : if needed for dcr-reg - - num-tx-chans : 1 cell, number of Tx channels - - num-rx-chans : 1 cell, number of Rx channels - - iii) ZMII node - - Required properties: - - compatible : compatible list, containing 2 entries, first is - "ibm,zmii-CHIP" where CHIP is the host ASIC (like - EMAC) and the second is "ibm,zmii". - For Axon, there is no ZMII node. - - reg : - - iv) RGMII node - - Required properties: - - compatible : compatible list, containing 2 entries, first is - "ibm,rgmii-CHIP" where CHIP is the host ASIC (like - EMAC) and the second is "ibm,rgmii". - For Axon, "ibm,rgmii-axon","ibm,rgmii" - - reg : - - revision : as provided by the RGMII new version register if - available. - For Axon: 0x0000012a - - d) Xilinx IP cores - - The Xilinx EDK toolchain ships with a set of IP cores (devices) for use - in Xilinx Spartan and Virtex FPGAs. The devices cover the whole range - of standard device types (network, serial, etc.) and miscellaneous - devices (gpio, LCD, spi, etc). Also, since these devices are - implemented within the fpga fabric every instance of the device can be - synthesised with different options that change the behaviour. - - Each IP-core has a set of parameters which the FPGA designer can use to - control how the core is synthesized. Historically, the EDK tool would - extract the device parameters relevant to device drivers and copy them - into an 'xparameters.h' in the form of #define symbols. This tells the - device drivers how the IP cores are configured, but it requres the kernel - to be recompiled every time the FPGA bitstream is resynthesized. - - The new approach is to export the parameters into the device tree and - generate a new device tree each time the FPGA bitstream changes. The - parameters which used to be exported as #defines will now become - properties of the device node. In general, device nodes for IP-cores - will take the following form: - - (name): (generic-name)@(base-address) { - compatible = "xlnx,(ip-core-name)-(HW_VER)" - [, (list of compatible devices), ...]; - reg = <(baseaddr) (size)>; - interrupt-parent = <&interrupt-controller-phandle>; - interrupts = < ... >; - xlnx,(parameter1) = "(string-value)"; - xlnx,(parameter2) = <(int-value)>; - }; - - (generic-name): an open firmware-style name that describes the - generic class of device. Preferably, this is one word, such - as 'serial' or 'ethernet'. - (ip-core-name): the name of the ip block (given after the BEGIN - directive in system.mhs). Should be in lowercase - and all underscores '_' converted to dashes '-'. - (name): is derived from the "PARAMETER INSTANCE" value. - (parameter#): C_* parameters from system.mhs. The C_ prefix is - dropped from the parameter name, the name is converted - to lowercase and all underscore '_' characters are - converted to dashes '-'. - (baseaddr): the baseaddr parameter value (often named C_BASEADDR). - (HW_VER): from the HW_VER parameter. - (size): the address range size (often C_HIGHADDR - C_BASEADDR + 1). - - Typically, the compatible list will include the exact IP core version - followed by an older IP core version which implements the same - interface or any other device with the same interface. - - 'reg', 'interrupt-parent' and 'interrupts' are all optional properties. - - For example, the following block from system.mhs: - - BEGIN opb_uartlite - PARAMETER INSTANCE = opb_uartlite_0 - PARAMETER HW_VER = 1.00.b - PARAMETER C_BAUDRATE = 115200 - PARAMETER C_DATA_BITS = 8 - PARAMETER C_ODD_PARITY = 0 - PARAMETER C_USE_PARITY = 0 - PARAMETER C_CLK_FREQ = 50000000 - PARAMETER C_BASEADDR = 0xEC100000 - PARAMETER C_HIGHADDR = 0xEC10FFFF - BUS_INTERFACE SOPB = opb_7 - PORT OPB_Clk = CLK_50MHz - PORT Interrupt = opb_uartlite_0_Interrupt - PORT RX = opb_uartlite_0_RX - PORT TX = opb_uartlite_0_TX - PORT OPB_Rst = sys_bus_reset_0 - END - - becomes the following device tree node: - - opb_uartlite_0: serial@ec100000 { - device_type = "serial"; - compatible = "xlnx,opb-uartlite-1.00.b"; - reg = ; - interrupt-parent = <&opb_intc_0>; - interrupts = <1 0>; // got this from the opb_intc parameters - current-speed = ; // standard serial device prop - clock-frequency = ; // standard serial device prop - xlnx,data-bits = <8>; - xlnx,odd-parity = <0>; - xlnx,use-parity = <0>; - }; - - Some IP cores actually implement 2 or more logical devices. In - this case, the device should still describe the whole IP core with - a single node and add a child node for each logical device. The - ranges property can be used to translate from parent IP-core to the - registers of each device. In addition, the parent node should be - compatible with the bus type 'xlnx,compound', and should contain - #address-cells and #size-cells, as with any other bus. (Note: this - makes the assumption that both logical devices have the same bus - binding. If this is not true, then separate nodes should be used - for each logical device). The 'cell-index' property can be used to - enumerate logical devices within an IP core. For example, the - following is the system.mhs entry for the dual ps2 controller found - on the ml403 reference design. - - BEGIN opb_ps2_dual_ref - PARAMETER INSTANCE = opb_ps2_dual_ref_0 - PARAMETER HW_VER = 1.00.a - PARAMETER C_BASEADDR = 0xA9000000 - PARAMETER C_HIGHADDR = 0xA9001FFF - BUS_INTERFACE SOPB = opb_v20_0 - PORT Sys_Intr1 = ps2_1_intr - PORT Sys_Intr2 = ps2_2_intr - PORT Clkin1 = ps2_clk_rx_1 - PORT Clkin2 = ps2_clk_rx_2 - PORT Clkpd1 = ps2_clk_tx_1 - PORT Clkpd2 = ps2_clk_tx_2 - PORT Rx1 = ps2_d_rx_1 - PORT Rx2 = ps2_d_rx_2 - PORT Txpd1 = ps2_d_tx_1 - PORT Txpd2 = ps2_d_tx_2 - END - - It would result in the following device tree nodes: - - opb_ps2_dual_ref_0: opb-ps2-dual-ref@a9000000 { - #address-cells = <1>; - #size-cells = <1>; - compatible = "xlnx,compound"; - ranges = <0 a9000000 2000>; - // If this device had extra parameters, then they would - // go here. - ps2@0 { - compatible = "xlnx,opb-ps2-dual-ref-1.00.a"; - reg = <0 40>; - interrupt-parent = <&opb_intc_0>; - interrupts = <3 0>; - cell-index = <0>; - }; - ps2@1000 { - compatible = "xlnx,opb-ps2-dual-ref-1.00.a"; - reg = <1000 40>; - interrupt-parent = <&opb_intc_0>; - interrupts = <3 0>; - cell-index = <0>; - }; - }; - - Also, the system.mhs file defines bus attachments from the processor - to the devices. The device tree structure should reflect the bus - attachments. Again an example; this system.mhs fragment: - - BEGIN ppc405_virtex4 - PARAMETER INSTANCE = ppc405_0 - PARAMETER HW_VER = 1.01.a - BUS_INTERFACE DPLB = plb_v34_0 - BUS_INTERFACE IPLB = plb_v34_0 - END - - BEGIN opb_intc - PARAMETER INSTANCE = opb_intc_0 - PARAMETER HW_VER = 1.00.c - PARAMETER C_BASEADDR = 0xD1000FC0 - PARAMETER C_HIGHADDR = 0xD1000FDF - BUS_INTERFACE SOPB = opb_v20_0 - END - - BEGIN opb_uart16550 - PARAMETER INSTANCE = opb_uart16550_0 - PARAMETER HW_VER = 1.00.d - PARAMETER C_BASEADDR = 0xa0000000 - PARAMETER C_HIGHADDR = 0xa0001FFF - BUS_INTERFACE SOPB = opb_v20_0 - END - - BEGIN plb_v34 - PARAMETER INSTANCE = plb_v34_0 - PARAMETER HW_VER = 1.02.a - END - - BEGIN plb_bram_if_cntlr - PARAMETER INSTANCE = plb_bram_if_cntlr_0 - PARAMETER HW_VER = 1.00.b - PARAMETER C_BASEADDR = 0xFFFF0000 - PARAMETER C_HIGHADDR = 0xFFFFFFFF - BUS_INTERFACE SPLB = plb_v34_0 - END - - BEGIN plb2opb_bridge - PARAMETER INSTANCE = plb2opb_bridge_0 - PARAMETER HW_VER = 1.01.a - PARAMETER C_RNG0_BASEADDR = 0x20000000 - PARAMETER C_RNG0_HIGHADDR = 0x3FFFFFFF - PARAMETER C_RNG1_BASEADDR = 0x60000000 - PARAMETER C_RNG1_HIGHADDR = 0x7FFFFFFF - PARAMETER C_RNG2_BASEADDR = 0x80000000 - PARAMETER C_RNG2_HIGHADDR = 0xBFFFFFFF - PARAMETER C_RNG3_BASEADDR = 0xC0000000 - PARAMETER C_RNG3_HIGHADDR = 0xDFFFFFFF - BUS_INTERFACE SPLB = plb_v34_0 - BUS_INTERFACE MOPB = opb_v20_0 - END - - Gives this device tree (some properties removed for clarity): - - plb@0 { - #address-cells = <1>; - #size-cells = <1>; - compatible = "xlnx,plb-v34-1.02.a"; - device_type = "ibm,plb"; - ranges; // 1:1 translation - - plb_bram_if_cntrl_0: bram@ffff0000 { - reg = ; - } - - opb@20000000 { - #address-cells = <1>; - #size-cells = <1>; - ranges = <20000000 20000000 20000000 - 60000000 60000000 20000000 - 80000000 80000000 40000000 - c0000000 c0000000 20000000>; - - opb_uart16550_0: serial@a0000000 { - reg = ; - }; - - opb_intc_0: interrupt-controller@d1000fc0 { - reg = ; - }; - }; - }; - - That covers the general approach to binding xilinx IP cores into the - device tree. The following are bindings for specific devices: - - i) Xilinx ML300 Framebuffer - - Simple framebuffer device from the ML300 reference design (also on the - ML403 reference design as well as others). - - Optional properties: - - resolution = : pixel resolution of framebuffer. Some - implementations use a different resolution. - Default is - - virt-resolution = : Size of framebuffer in memory. - Default is . - - rotate-display (empty) : rotate display 180 degrees. - - ii) Xilinx SystemACE - - The Xilinx SystemACE device is used to program FPGAs from an FPGA - bitstream stored on a CF card. It can also be used as a generic CF - interface device. - - Optional properties: - - 8-bit (empty) : Set this property for SystemACE in 8 bit mode - - iii) Xilinx EMAC and Xilinx TEMAC - - Xilinx Ethernet devices. In addition to general xilinx properties - listed above, nodes for these devices should include a phy-handle - property, and may include other common network device properties - like local-mac-address. - - iv) Xilinx Uartlite - - Xilinx uartlite devices are simple fixed speed serial ports. - - Required properties: - - current-speed : Baud rate of uartlite - - v) Xilinx hwicap - - Xilinx hwicap devices provide access to the configuration logic - of the FPGA through the Internal Configuration Access Port - (ICAP). The ICAP enables partial reconfiguration of the FPGA, - readback of the configuration information, and some control over - 'warm boots' of the FPGA fabric. - - Required properties: - - xlnx,family : The family of the FPGA, necessary since the - capabilities of the underlying ICAP hardware - differ between different families. May be - 'virtex2p', 'virtex4', or 'virtex5'. - - vi) Xilinx Uart 16550 - - Xilinx UART 16550 devices are very similar to the NS16550 but with - different register spacing and an offset from the base address. - - Required properties: - - clock-frequency : Frequency of the clock input - - reg-offset : A value of 3 is required - - reg-shift : A value of 2 is required - - e) USB EHCI controllers - - Required properties: - - compatible : should be "usb-ehci". - - reg : should contain at least address and length of the standard EHCI - register set for the device. Optional platform-dependent registers - (debug-port or other) can be also specified here, but only after - definition of standard EHCI registers. - - interrupts : one EHCI interrupt should be described here. - If device registers are implemented in big endian mode, the device - node should have "big-endian-regs" property. - If controller implementation operates with big endian descriptors, - "big-endian-desc" property should be specified. - If both big endian registers and descriptors are used by the controller - implementation, "big-endian" property can be specified instead of having - both "big-endian-regs" and "big-endian-desc". - - Example (Sequoia 440EPx): - ehci@e0000300 { - compatible = "ibm,usb-ehci-440epx", "usb-ehci"; - interrupt-parent = <&UIC0>; - interrupts = <1a 4>; - reg = <0 e0000300 90 0 e0000390 70>; - big-endian; - }; - - f) MDIO on GPIOs - - Currently defined compatibles: - - virtual,gpio-mdio - - MDC and MDIO lines connected to GPIO controllers are listed in the - gpios property as described in section VIII.1 in the following order: - - MDC, MDIO. - - Example: - - mdio { - compatible = "virtual,mdio-gpio"; - #address-cells = <1>; - #size-cells = <0>; - gpios = <&qe_pio_a 11 - &qe_pio_c 6>; - }; - - g) SPI (Serial Peripheral Interface) busses - - SPI busses can be described with a node for the SPI master device - and a set of child nodes for each SPI slave on the bus. For this - discussion, it is assumed that the system's SPI controller is in - SPI master mode. This binding does not describe SPI controllers - in slave mode. - - The SPI master node requires the following properties: - - #address-cells - number of cells required to define a chip select - address on the SPI bus. - - #size-cells - should be zero. - - compatible - name of SPI bus controller following generic names - recommended practice. - No other properties are required in the SPI bus node. It is assumed - that a driver for an SPI bus device will understand that it is an SPI bus. - However, the binding does not attempt to define the specific method for - assigning chip select numbers. Since SPI chip select configuration is - flexible and non-standardized, it is left out of this binding with the - assumption that board specific platform code will be used to manage - chip selects. Individual drivers can define additional properties to - support describing the chip select layout. - - SPI slave nodes must be children of the SPI master node and can - contain the following properties. - - reg - (required) chip select address of device. - - compatible - (required) name of SPI device following generic names - recommended practice - - spi-max-frequency - (required) Maximum SPI clocking speed of device in Hz - - spi-cpol - (optional) Empty property indicating device requires - inverse clock polarity (CPOL) mode - - spi-cpha - (optional) Empty property indicating device requires - shifted clock phase (CPHA) mode - - spi-cs-high - (optional) Empty property indicating device requires - chip select active high - - SPI example for an MPC5200 SPI bus: - spi@f00 { - #address-cells = <1>; - #size-cells = <0>; - compatible = "fsl,mpc5200b-spi","fsl,mpc5200-spi"; - reg = <0xf00 0x20>; - interrupts = <2 13 0 2 14 0>; - interrupt-parent = <&mpc5200_pic>; - - ethernet-switch@0 { - compatible = "micrel,ks8995m"; - spi-max-frequency = <1000000>; - reg = <0>; - }; - - codec@1 { - compatible = "ti,tlv320aic26"; - spi-max-frequency = <100000>; - reg = <1>; - }; - }; - -VII - Marvell Discovery mv64[345]6x System Controller chips -=========================================================== - -The Marvell mv64[345]60 series of system controller chips contain -many of the peripherals needed to implement a complete computer -system. In this section, we define device tree nodes to describe -the system controller chip itself and each of the peripherals -which it contains. Compatible string values for each node are -prefixed with the string "marvell,", for Marvell Technology Group Ltd. - -1) The /system-controller node - - This node is used to represent the system-controller and must be - present when the system uses a system controller chip. The top-level - system-controller node contains information that is global to all - devices within the system controller chip. The node name begins - with "system-controller" followed by the unit address, which is - the base address of the memory-mapped register set for the system - controller chip. - - Required properties: - - - ranges : Describes the translation of system controller addresses - for memory mapped registers. - - clock-frequency: Contains the main clock frequency for the system - controller chip. - - reg : This property defines the address and size of the - memory-mapped registers contained within the system controller - chip. The address specified in the "reg" property should match - the unit address of the system-controller node. - - #address-cells : Address representation for system controller - devices. This field represents the number of cells needed to - represent the address of the memory-mapped registers of devices - within the system controller chip. - - #size-cells : Size representation for for the memory-mapped - registers within the system controller chip. - - #interrupt-cells : Defines the width of cells used to represent - interrupts. - - Optional properties: - - - model : The specific model of the system controller chip. Such - as, "mv64360", "mv64460", or "mv64560". - - compatible : A string identifying the compatibility identifiers - of the system controller chip. - - The system-controller node contains child nodes for each system - controller device that the platform uses. Nodes should not be created - for devices which exist on the system controller chip but are not used - - Example Marvell Discovery mv64360 system-controller node: - - system-controller@f1000000 { /* Marvell Discovery mv64360 */ - #address-cells = <1>; - #size-cells = <1>; - model = "mv64360"; /* Default */ - compatible = "marvell,mv64360"; - clock-frequency = <133333333>; - reg = <0xf1000000 0x10000>; - virtual-reg = <0xf1000000>; - ranges = <0x88000000 0x88000000 0x1000000 /* PCI 0 I/O Space */ - 0x80000000 0x80000000 0x8000000 /* PCI 0 MEM Space */ - 0xa0000000 0xa0000000 0x4000000 /* User FLASH */ - 0x00000000 0xf1000000 0x0010000 /* Bridge's regs */ - 0xf2000000 0xf2000000 0x0040000>;/* Integrated SRAM */ - - [ child node definitions... ] - } - -2) Child nodes of /system-controller - - a) Marvell Discovery MDIO bus - - The MDIO is a bus to which the PHY devices are connected. For each - device that exists on this bus, a child node should be created. See - the definition of the PHY node below for an example of how to define - a PHY. - - Required properties: - - #address-cells : Should be <1> - - #size-cells : Should be <0> - - device_type : Should be "mdio" - - compatible : Should be "marvell,mv64360-mdio" - - Example: - - mdio { - #address-cells = <1>; - #size-cells = <0>; - device_type = "mdio"; - compatible = "marvell,mv64360-mdio"; - - ethernet-phy@0 { - ...... - }; - }; - - - b) Marvell Discovery ethernet controller - - The Discover ethernet controller is described with two levels - of nodes. The first level describes an ethernet silicon block - and the second level describes up to 3 ethernet nodes within - that block. The reason for the multiple levels is that the - registers for the node are interleaved within a single set - of registers. The "ethernet-block" level describes the - shared register set, and the "ethernet" nodes describe ethernet - port-specific properties. - - Ethernet block node - - Required properties: - - #address-cells : <1> - - #size-cells : <0> - - compatible : "marvell,mv64360-eth-block" - - reg : Offset and length of the register set for this block - - Example Discovery Ethernet block node: - ethernet-block@2000 { - #address-cells = <1>; - #size-cells = <0>; - compatible = "marvell,mv64360-eth-block"; - reg = <0x2000 0x2000>; - ethernet@0 { - ....... - }; - }; - - Ethernet port node - - Required properties: - - device_type : Should be "network". - - compatible : Should be "marvell,mv64360-eth". - - reg : Should be <0>, <1>, or <2>, according to which registers - within the silicon block the device uses. - - interrupts : where a is the interrupt number for the port. - - interrupt-parent : the phandle for the interrupt controller - that services interrupts for this device. - - phy : the phandle for the PHY connected to this ethernet - controller. - - local-mac-address : 6 bytes, MAC address - - Example Discovery Ethernet port node: - ethernet@0 { - device_type = "network"; - compatible = "marvell,mv64360-eth"; - reg = <0>; - interrupts = <32>; - interrupt-parent = <&PIC>; - phy = <&PHY0>; - local-mac-address = [ 00 00 00 00 00 00 ]; - }; - - - - c) Marvell Discovery PHY nodes - - Required properties: - - device_type : Should be "ethernet-phy" - - interrupts : where a is the interrupt number for this phy. - - interrupt-parent : the phandle for the interrupt controller that - services interrupts for this device. - - reg : The ID number for the phy, usually a small integer - - Example Discovery PHY node: - ethernet-phy@1 { - device_type = "ethernet-phy"; - compatible = "broadcom,bcm5421"; - interrupts = <76>; /* GPP 12 */ - interrupt-parent = <&PIC>; - reg = <1>; - }; - - - d) Marvell Discovery SDMA nodes - - Represent DMA hardware associated with the MPSC (multiprotocol - serial controllers). - - Required properties: - - compatible : "marvell,mv64360-sdma" - - reg : Offset and length of the register set for this device - - interrupts : where a is the interrupt number for the DMA - device. - - interrupt-parent : the phandle for the interrupt controller - that services interrupts for this device. - - Example Discovery SDMA node: - sdma@4000 { - compatible = "marvell,mv64360-sdma"; - reg = <0x4000 0xc18>; - virtual-reg = <0xf1004000>; - interrupts = <36>; - interrupt-parent = <&PIC>; - }; - - - e) Marvell Discovery BRG nodes - - Represent baud rate generator hardware associated with the MPSC - (multiprotocol serial controllers). - - Required properties: - - compatible : "marvell,mv64360-brg" - - reg : Offset and length of the register set for this device - - clock-src : A value from 0 to 15 which selects the clock - source for the baud rate generator. This value corresponds - to the CLKS value in the BRGx configuration register. See - the mv64x60 User's Manual. - - clock-frequence : The frequency (in Hz) of the baud rate - generator's input clock. - - current-speed : The current speed setting (presumably by - firmware) of the baud rate generator. - - Example Discovery BRG node: - brg@b200 { - compatible = "marvell,mv64360-brg"; - reg = <0xb200 0x8>; - clock-src = <8>; - clock-frequency = <133333333>; - current-speed = <9600>; - }; - - - f) Marvell Discovery CUNIT nodes - - Represent the Serial Communications Unit device hardware. - - Required properties: - - reg : Offset and length of the register set for this device - - Example Discovery CUNIT node: - cunit@f200 { - reg = <0xf200 0x200>; - }; - - - g) Marvell Discovery MPSCROUTING nodes - - Represent the Discovery's MPSC routing hardware - - Required properties: - - reg : Offset and length of the register set for this device - - Example Discovery CUNIT node: - mpscrouting@b500 { - reg = <0xb400 0xc>; - }; - - - h) Marvell Discovery MPSCINTR nodes - - Represent the Discovery's MPSC DMA interrupt hardware registers - (SDMA cause and mask registers). - - Required properties: - - reg : Offset and length of the register set for this device - - Example Discovery MPSCINTR node: - mpsintr@b800 { - reg = <0xb800 0x100>; - }; - - - i) Marvell Discovery MPSC nodes - - Represent the Discovery's MPSC (Multiprotocol Serial Controller) - serial port. - - Required properties: - - device_type : "serial" - - compatible : "marvell,mv64360-mpsc" - - reg : Offset and length of the register set for this device - - sdma : the phandle for the SDMA node used by this port - - brg : the phandle for the BRG node used by this port - - cunit : the phandle for the CUNIT node used by this port - - mpscrouting : the phandle for the MPSCROUTING node used by this port - - mpscintr : the phandle for the MPSCINTR node used by this port - - cell-index : the hardware index of this cell in the MPSC core - - max_idle : value needed for MPSC CHR3 (Maximum Frame Length) - register - - interrupts : where a is the interrupt number for the MPSC. - - interrupt-parent : the phandle for the interrupt controller - that services interrupts for this device. - - Example Discovery MPSCINTR node: - mpsc@8000 { - device_type = "serial"; - compatible = "marvell,mv64360-mpsc"; - reg = <0x8000 0x38>; - virtual-reg = <0xf1008000>; - sdma = <&SDMA0>; - brg = <&BRG0>; - cunit = <&CUNIT>; - mpscrouting = <&MPSCROUTING>; - mpscintr = <&MPSCINTR>; - cell-index = <0>; - max_idle = <40>; - interrupts = <40>; - interrupt-parent = <&PIC>; - }; - - - j) Marvell Discovery Watch Dog Timer nodes - - Represent the Discovery's watchdog timer hardware - - Required properties: - - compatible : "marvell,mv64360-wdt" - - reg : Offset and length of the register set for this device - - Example Discovery Watch Dog Timer node: - wdt@b410 { - compatible = "marvell,mv64360-wdt"; - reg = <0xb410 0x8>; - }; - - - k) Marvell Discovery I2C nodes - - Represent the Discovery's I2C hardware - - Required properties: - - device_type : "i2c" - - compatible : "marvell,mv64360-i2c" - - reg : Offset and length of the register set for this device - - interrupts : where a is the interrupt number for the I2C. - - interrupt-parent : the phandle for the interrupt controller - that services interrupts for this device. - - Example Discovery I2C node: - compatible = "marvell,mv64360-i2c"; - reg = <0xc000 0x20>; - virtual-reg = <0xf100c000>; - interrupts = <37>; - interrupt-parent = <&PIC>; - }; - - - l) Marvell Discovery PIC (Programmable Interrupt Controller) nodes - - Represent the Discovery's PIC hardware - - Required properties: - - #interrupt-cells : <1> - - #address-cells : <0> - - compatible : "marvell,mv64360-pic" - - reg : Offset and length of the register set for this device - - interrupt-controller - - Example Discovery PIC node: - pic { - #interrupt-cells = <1>; - #address-cells = <0>; - compatible = "marvell,mv64360-pic"; - reg = <0x0 0x88>; - interrupt-controller; - }; - - - m) Marvell Discovery MPP (Multipurpose Pins) multiplexing nodes - - Represent the Discovery's MPP hardware - - Required properties: - - compatible : "marvell,mv64360-mpp" - - reg : Offset and length of the register set for this device - - Example Discovery MPP node: - mpp@f000 { - compatible = "marvell,mv64360-mpp"; - reg = <0xf000 0x10>; - }; - - - n) Marvell Discovery GPP (General Purpose Pins) nodes - - Represent the Discovery's GPP hardware - - Required properties: - - compatible : "marvell,mv64360-gpp" - - reg : Offset and length of the register set for this device - - Example Discovery GPP node: - gpp@f000 { - compatible = "marvell,mv64360-gpp"; - reg = <0xf100 0x20>; - }; - - - o) Marvell Discovery PCI host bridge node - - Represents the Discovery's PCI host bridge device. The properties - for this node conform to Rev 2.1 of the PCI Bus Binding to IEEE - 1275-1994. A typical value for the compatible property is - "marvell,mv64360-pci". - - Example Discovery PCI host bridge node - pci@80000000 { - #address-cells = <3>; - #size-cells = <2>; - #interrupt-cells = <1>; - device_type = "pci"; - compatible = "marvell,mv64360-pci"; - reg = <0xcf8 0x8>; - ranges = <0x01000000 0x0 0x0 - 0x88000000 0x0 0x01000000 - 0x02000000 0x0 0x80000000 - 0x80000000 0x0 0x08000000>; - bus-range = <0 255>; - clock-frequency = <66000000>; - interrupt-parent = <&PIC>; - interrupt-map-mask = <0xf800 0x0 0x0 0x7>; - interrupt-map = < - /* IDSEL 0x0a */ - 0x5000 0 0 1 &PIC 80 - 0x5000 0 0 2 &PIC 81 - 0x5000 0 0 3 &PIC 91 - 0x5000 0 0 4 &PIC 93 - - /* IDSEL 0x0b */ - 0x5800 0 0 1 &PIC 91 - 0x5800 0 0 2 &PIC 93 - 0x5800 0 0 3 &PIC 80 - 0x5800 0 0 4 &PIC 81 - - /* IDSEL 0x0c */ - 0x6000 0 0 1 &PIC 91 - 0x6000 0 0 2 &PIC 93 - 0x6000 0 0 3 &PIC 80 - 0x6000 0 0 4 &PIC 81 - - /* IDSEL 0x0d */ - 0x6800 0 0 1 &PIC 93 - 0x6800 0 0 2 &PIC 80 - 0x6800 0 0 3 &PIC 81 - 0x6800 0 0 4 &PIC 91 - >; - }; - - - p) Marvell Discovery CPU Error nodes - - Represent the Discovery's CPU error handler device. - - Required properties: - - compatible : "marvell,mv64360-cpu-error" - - reg : Offset and length of the register set for this device - - interrupts : the interrupt number for this device - - interrupt-parent : the phandle for the interrupt controller - that services interrupts for this device. - - Example Discovery CPU Error node: - cpu-error@0070 { - compatible = "marvell,mv64360-cpu-error"; - reg = <0x70 0x10 0x128 0x28>; - interrupts = <3>; - interrupt-parent = <&PIC>; - }; - - - q) Marvell Discovery SRAM Controller nodes - - Represent the Discovery's SRAM controller device. - - Required properties: - - compatible : "marvell,mv64360-sram-ctrl" - - reg : Offset and length of the register set for this device - - interrupts : the interrupt number for this device - - interrupt-parent : the phandle for the interrupt controller - that services interrupts for this device. - - Example Discovery SRAM Controller node: - sram-ctrl@0380 { - compatible = "marvell,mv64360-sram-ctrl"; - reg = <0x380 0x80>; - interrupts = <13>; - interrupt-parent = <&PIC>; - }; - - - r) Marvell Discovery PCI Error Handler nodes - - Represent the Discovery's PCI error handler device. - - Required properties: - - compatible : "marvell,mv64360-pci-error" - - reg : Offset and length of the register set for this device - - interrupts : the interrupt number for this device - - interrupt-parent : the phandle for the interrupt controller - that services interrupts for this device. - - Example Discovery PCI Error Handler node: - pci-error@1d40 { - compatible = "marvell,mv64360-pci-error"; - reg = <0x1d40 0x40 0xc28 0x4>; - interrupts = <12>; - interrupt-parent = <&PIC>; - }; - - - s) Marvell Discovery Memory Controller nodes - - Represent the Discovery's memory controller device. - - Required properties: - - compatible : "marvell,mv64360-mem-ctrl" - - reg : Offset and length of the register set for this device - - interrupts : the interrupt number for this device - - interrupt-parent : the phandle for the interrupt controller - that services interrupts for this device. - - Example Discovery Memory Controller node: - mem-ctrl@1400 { - compatible = "marvell,mv64360-mem-ctrl"; - reg = <0x1400 0x60>; - interrupts = <17>; - interrupt-parent = <&PIC>; - }; - - -VIII - Specifying interrupt information for devices +VII - Specifying interrupt information for devices =================================================== The device tree represents the busses and devices of a hardware @@ -2439,56 +1324,7 @@ encodings listed below: 2 = high to low edge sensitive type enabled 3 = low to high edge sensitive type enabled -IX - Specifying GPIO information for devices -============================================ - -1) gpios property ------------------ - -Nodes that makes use of GPIOs should define them using `gpios' property, -format of which is: <&gpio-controller1-phandle gpio1-specifier - &gpio-controller2-phandle gpio2-specifier - 0 /* holes are permitted, means no GPIO 3 */ - &gpio-controller4-phandle gpio4-specifier - ...>; - -Note that gpio-specifier length is controller dependent. - -gpio-specifier may encode: bank, pin position inside the bank, -whether pin is open-drain and whether pin is logically inverted. - -Example of the node using GPIOs: - - node { - gpios = <&qe_pio_e 18 0>; - }; - -In this example gpio-specifier is "18 0" and encodes GPIO pin number, -and empty GPIO flags as accepted by the "qe_pio_e" gpio-controller. - -2) gpio-controller nodes ------------------------- - -Every GPIO controller node must have #gpio-cells property defined, -this information will be used to translate gpio-specifiers. - -Example of two SOC GPIO banks defined as gpio-controller nodes: - - qe_pio_a: gpio-controller@1400 { - #gpio-cells = <2>; - compatible = "fsl,qe-pario-bank-a", "fsl,qe-pario-bank"; - reg = <0x1400 0x18>; - gpio-controller; - }; - - qe_pio_e: gpio-controller@1460 { - #gpio-cells = <2>; - compatible = "fsl,qe-pario-bank-e", "fsl,qe-pario-bank"; - reg = <0x1460 0x18>; - gpio-controller; - }; - -X - Specifying Device Power Management Information (sleep property) +VIII - Specifying Device Power Management Information (sleep property) =================================================================== Devices on SOCs often have mechanisms for placing devices into low-power diff --git a/Documentation/powerpc/dts-bindings/4xx/emac.txt b/Documentation/powerpc/dts-bindings/4xx/emac.txt new file mode 100644 index 000000000000..2161334a7ca5 --- /dev/null +++ b/Documentation/powerpc/dts-bindings/4xx/emac.txt @@ -0,0 +1,148 @@ + 4xx/Axon EMAC ethernet nodes + + The EMAC ethernet controller in IBM and AMCC 4xx chips, and also + the Axon bridge. To operate this needs to interact with a ths + special McMAL DMA controller, and sometimes an RGMII or ZMII + interface. In addition to the nodes and properties described + below, the node for the OPB bus on which the EMAC sits must have a + correct clock-frequency property. + + i) The EMAC node itself + + Required properties: + - device_type : "network" + + - compatible : compatible list, contains 2 entries, first is + "ibm,emac-CHIP" where CHIP is the host ASIC (440gx, + 405gp, Axon) and second is either "ibm,emac" or + "ibm,emac4". For Axon, thus, we have: "ibm,emac-axon", + "ibm,emac4" + - interrupts : + - interrupt-parent : optional, if needed for interrupt mapping + - reg : + - local-mac-address : 6 bytes, MAC address + - mal-device : phandle of the associated McMAL node + - mal-tx-channel : 1 cell, index of the tx channel on McMAL associated + with this EMAC + - mal-rx-channel : 1 cell, index of the rx channel on McMAL associated + with this EMAC + - cell-index : 1 cell, hardware index of the EMAC cell on a given + ASIC (typically 0x0 and 0x1 for EMAC0 and EMAC1 on + each Axon chip) + - max-frame-size : 1 cell, maximum frame size supported in bytes + - rx-fifo-size : 1 cell, Rx fifo size in bytes for 10 and 100 Mb/sec + operations. + For Axon, 2048 + - tx-fifo-size : 1 cell, Tx fifo size in bytes for 10 and 100 Mb/sec + operations. + For Axon, 2048. + - fifo-entry-size : 1 cell, size of a fifo entry (used to calculate + thresholds). + For Axon, 0x00000010 + - mal-burst-size : 1 cell, MAL burst size (used to calculate thresholds) + in bytes. + For Axon, 0x00000100 (I think ...) + - phy-mode : string, mode of operations of the PHY interface. + Supported values are: "mii", "rmii", "smii", "rgmii", + "tbi", "gmii", rtbi", "sgmii". + For Axon on CAB, it is "rgmii" + - mdio-device : 1 cell, required iff using shared MDIO registers + (440EP). phandle of the EMAC to use to drive the + MDIO lines for the PHY used by this EMAC. + - zmii-device : 1 cell, required iff connected to a ZMII. phandle of + the ZMII device node + - zmii-channel : 1 cell, required iff connected to a ZMII. Which ZMII + channel or 0xffffffff if ZMII is only used for MDIO. + - rgmii-device : 1 cell, required iff connected to an RGMII. phandle + of the RGMII device node. + For Axon: phandle of plb5/plb4/opb/rgmii + - rgmii-channel : 1 cell, required iff connected to an RGMII. Which + RGMII channel is used by this EMAC. + Fox Axon: present, whatever value is appropriate for each + EMAC, that is the content of the current (bogus) "phy-port" + property. + + Optional properties: + - phy-address : 1 cell, optional, MDIO address of the PHY. If absent, + a search is performed. + - phy-map : 1 cell, optional, bitmap of addresses to probe the PHY + for, used if phy-address is absent. bit 0x00000001 is + MDIO address 0. + For Axon it can be absent, though my current driver + doesn't handle phy-address yet so for now, keep + 0x00ffffff in it. + - rx-fifo-size-gige : 1 cell, Rx fifo size in bytes for 1000 Mb/sec + operations (if absent the value is the same as + rx-fifo-size). For Axon, either absent or 2048. + - tx-fifo-size-gige : 1 cell, Tx fifo size in bytes for 1000 Mb/sec + operations (if absent the value is the same as + tx-fifo-size). For Axon, either absent or 2048. + - tah-device : 1 cell, optional. If connected to a TAH engine for + offload, phandle of the TAH device node. + - tah-channel : 1 cell, optional. If appropriate, channel used on the + TAH engine. + + Example: + + EMAC0: ethernet@40000800 { + device_type = "network"; + compatible = "ibm,emac-440gp", "ibm,emac"; + interrupt-parent = <&UIC1>; + interrupts = <1c 4 1d 4>; + reg = <40000800 70>; + local-mac-address = [00 04 AC E3 1B 1E]; + mal-device = <&MAL0>; + mal-tx-channel = <0 1>; + mal-rx-channel = <0>; + cell-index = <0>; + max-frame-size = <5dc>; + rx-fifo-size = <1000>; + tx-fifo-size = <800>; + phy-mode = "rmii"; + phy-map = <00000001>; + zmii-device = <&ZMII0>; + zmii-channel = <0>; + }; + + ii) McMAL node + + Required properties: + - device_type : "dma-controller" + - compatible : compatible list, containing 2 entries, first is + "ibm,mcmal-CHIP" where CHIP is the host ASIC (like + emac) and the second is either "ibm,mcmal" or + "ibm,mcmal2". + For Axon, "ibm,mcmal-axon","ibm,mcmal2" + - interrupts : . + For Axon: This is _different_ from the current + firmware. We use the "delayed" interrupts for txeob + and rxeob. Thus we end up with mapping those 5 MPIC + interrupts, all level positive sensitive: 10, 11, 32, + 33, 34 (in decimal) + - dcr-reg : < DCR registers range > + - dcr-parent : if needed for dcr-reg + - num-tx-chans : 1 cell, number of Tx channels + - num-rx-chans : 1 cell, number of Rx channels + + iii) ZMII node + + Required properties: + - compatible : compatible list, containing 2 entries, first is + "ibm,zmii-CHIP" where CHIP is the host ASIC (like + EMAC) and the second is "ibm,zmii". + For Axon, there is no ZMII node. + - reg : + + iv) RGMII node + + Required properties: + - compatible : compatible list, containing 2 entries, first is + "ibm,rgmii-CHIP" where CHIP is the host ASIC (like + EMAC) and the second is "ibm,rgmii". + For Axon, "ibm,rgmii-axon","ibm,rgmii" + - reg : + - revision : as provided by the RGMII new version register if + available. + For Axon: 0x0000012a + diff --git a/Documentation/powerpc/dts-bindings/gpio/gpio.txt b/Documentation/powerpc/dts-bindings/gpio/gpio.txt new file mode 100644 index 000000000000..edaa84d288a1 --- /dev/null +++ b/Documentation/powerpc/dts-bindings/gpio/gpio.txt @@ -0,0 +1,50 @@ +Specifying GPIO information for devices +============================================ + +1) gpios property +----------------- + +Nodes that makes use of GPIOs should define them using `gpios' property, +format of which is: <&gpio-controller1-phandle gpio1-specifier + &gpio-controller2-phandle gpio2-specifier + 0 /* holes are permitted, means no GPIO 3 */ + &gpio-controller4-phandle gpio4-specifier + ...>; + +Note that gpio-specifier length is controller dependent. + +gpio-specifier may encode: bank, pin position inside the bank, +whether pin is open-drain and whether pin is logically inverted. + +Example of the node using GPIOs: + + node { + gpios = <&qe_pio_e 18 0>; + }; + +In this example gpio-specifier is "18 0" and encodes GPIO pin number, +and empty GPIO flags as accepted by the "qe_pio_e" gpio-controller. + +2) gpio-controller nodes +------------------------ + +Every GPIO controller node must have #gpio-cells property defined, +this information will be used to translate gpio-specifiers. + +Example of two SOC GPIO banks defined as gpio-controller nodes: + + qe_pio_a: gpio-controller@1400 { + #gpio-cells = <2>; + compatible = "fsl,qe-pario-bank-a", "fsl,qe-pario-bank"; + reg = <0x1400 0x18>; + gpio-controller; + }; + + qe_pio_e: gpio-controller@1460 { + #gpio-cells = <2>; + compatible = "fsl,qe-pario-bank-e", "fsl,qe-pario-bank"; + reg = <0x1460 0x18>; + gpio-controller; + }; + + diff --git a/Documentation/powerpc/dts-bindings/gpio/mdio.txt b/Documentation/powerpc/dts-bindings/gpio/mdio.txt new file mode 100644 index 000000000000..bc9549529014 --- /dev/null +++ b/Documentation/powerpc/dts-bindings/gpio/mdio.txt @@ -0,0 +1,19 @@ +MDIO on GPIOs + +Currently defined compatibles: +- virtual,gpio-mdio + +MDC and MDIO lines connected to GPIO controllers are listed in the +gpios property as described in section VIII.1 in the following order: + +MDC, MDIO. + +Example: + +mdio { + compatible = "virtual,mdio-gpio"; + #address-cells = <1>; + #size-cells = <0>; + gpios = <&qe_pio_a 11 + &qe_pio_c 6>; +}; diff --git a/Documentation/powerpc/dts-bindings/marvell.txt b/Documentation/powerpc/dts-bindings/marvell.txt new file mode 100644 index 000000000000..3708a2fd4747 --- /dev/null +++ b/Documentation/powerpc/dts-bindings/marvell.txt @@ -0,0 +1,521 @@ +Marvell Discovery mv64[345]6x System Controller chips +=========================================================== + +The Marvell mv64[345]60 series of system controller chips contain +many of the peripherals needed to implement a complete computer +system. In this section, we define device tree nodes to describe +the system controller chip itself and each of the peripherals +which it contains. Compatible string values for each node are +prefixed with the string "marvell,", for Marvell Technology Group Ltd. + +1) The /system-controller node + + This node is used to represent the system-controller and must be + present when the system uses a system controller chip. The top-level + system-controller node contains information that is global to all + devices within the system controller chip. The node name begins + with "system-controller" followed by the unit address, which is + the base address of the memory-mapped register set for the system + controller chip. + + Required properties: + + - ranges : Describes the translation of system controller addresses + for memory mapped registers. + - clock-frequency: Contains the main clock frequency for the system + controller chip. + - reg : This property defines the address and size of the + memory-mapped registers contained within the system controller + chip. The address specified in the "reg" property should match + the unit address of the system-controller node. + - #address-cells : Address representation for system controller + devices. This field represents the number of cells needed to + represent the address of the memory-mapped registers of devices + within the system controller chip. + - #size-cells : Size representation for for the memory-mapped + registers within the system controller chip. + - #interrupt-cells : Defines the width of cells used to represent + interrupts. + + Optional properties: + + - model : The specific model of the system controller chip. Such + as, "mv64360", "mv64460", or "mv64560". + - compatible : A string identifying the compatibility identifiers + of the system controller chip. + + The system-controller node contains child nodes for each system + controller device that the platform uses. Nodes should not be created + for devices which exist on the system controller chip but are not used + + Example Marvell Discovery mv64360 system-controller node: + + system-controller@f1000000 { /* Marvell Discovery mv64360 */ + #address-cells = <1>; + #size-cells = <1>; + model = "mv64360"; /* Default */ + compatible = "marvell,mv64360"; + clock-frequency = <133333333>; + reg = <0xf1000000 0x10000>; + virtual-reg = <0xf1000000>; + ranges = <0x88000000 0x88000000 0x1000000 /* PCI 0 I/O Space */ + 0x80000000 0x80000000 0x8000000 /* PCI 0 MEM Space */ + 0xa0000000 0xa0000000 0x4000000 /* User FLASH */ + 0x00000000 0xf1000000 0x0010000 /* Bridge's regs */ + 0xf2000000 0xf2000000 0x0040000>;/* Integrated SRAM */ + + [ child node definitions... ] + } + +2) Child nodes of /system-controller + + a) Marvell Discovery MDIO bus + + The MDIO is a bus to which the PHY devices are connected. For each + device that exists on this bus, a child node should be created. See + the definition of the PHY node below for an example of how to define + a PHY. + + Required properties: + - #address-cells : Should be <1> + - #size-cells : Should be <0> + - device_type : Should be "mdio" + - compatible : Should be "marvell,mv64360-mdio" + + Example: + + mdio { + #address-cells = <1>; + #size-cells = <0>; + device_type = "mdio"; + compatible = "marvell,mv64360-mdio"; + + ethernet-phy@0 { + ...... + }; + }; + + + b) Marvell Discovery ethernet controller + + The Discover ethernet controller is described with two levels + of nodes. The first level describes an ethernet silicon block + and the second level describes up to 3 ethernet nodes within + that block. The reason for the multiple levels is that the + registers for the node are interleaved within a single set + of registers. The "ethernet-block" level describes the + shared register set, and the "ethernet" nodes describe ethernet + port-specific properties. + + Ethernet block node + + Required properties: + - #address-cells : <1> + - #size-cells : <0> + - compatible : "marvell,mv64360-eth-block" + - reg : Offset and length of the register set for this block + + Example Discovery Ethernet block node: + ethernet-block@2000 { + #address-cells = <1>; + #size-cells = <0>; + compatible = "marvell,mv64360-eth-block"; + reg = <0x2000 0x2000>; + ethernet@0 { + ....... + }; + }; + + Ethernet port node + + Required properties: + - device_type : Should be "network". + - compatible : Should be "marvell,mv64360-eth". + - reg : Should be <0>, <1>, or <2>, according to which registers + within the silicon block the device uses. + - interrupts : where a is the interrupt number for the port. + - interrupt-parent : the phandle for the interrupt controller + that services interrupts for this device. + - phy : the phandle for the PHY connected to this ethernet + controller. + - local-mac-address : 6 bytes, MAC address + + Example Discovery Ethernet port node: + ethernet@0 { + device_type = "network"; + compatible = "marvell,mv64360-eth"; + reg = <0>; + interrupts = <32>; + interrupt-parent = <&PIC>; + phy = <&PHY0>; + local-mac-address = [ 00 00 00 00 00 00 ]; + }; + + + + c) Marvell Discovery PHY nodes + + Required properties: + - device_type : Should be "ethernet-phy" + - interrupts : where a is the interrupt number for this phy. + - interrupt-parent : the phandle for the interrupt controller that + services interrupts for this device. + - reg : The ID number for the phy, usually a small integer + + Example Discovery PHY node: + ethernet-phy@1 { + device_type = "ethernet-phy"; + compatible = "broadcom,bcm5421"; + interrupts = <76>; /* GPP 12 */ + interrupt-parent = <&PIC>; + reg = <1>; + }; + + + d) Marvell Discovery SDMA nodes + + Represent DMA hardware associated with the MPSC (multiprotocol + serial controllers). + + Required properties: + - compatible : "marvell,mv64360-sdma" + - reg : Offset and length of the register set for this device + - interrupts : where a is the interrupt number for the DMA + device. + - interrupt-parent : the phandle for the interrupt controller + that services interrupts for this device. + + Example Discovery SDMA node: + sdma@4000 { + compatible = "marvell,mv64360-sdma"; + reg = <0x4000 0xc18>; + virtual-reg = <0xf1004000>; + interrupts = <36>; + interrupt-parent = <&PIC>; + }; + + + e) Marvell Discovery BRG nodes + + Represent baud rate generator hardware associated with the MPSC + (multiprotocol serial controllers). + + Required properties: + - compatible : "marvell,mv64360-brg" + - reg : Offset and length of the register set for this device + - clock-src : A value from 0 to 15 which selects the clock + source for the baud rate generator. This value corresponds + to the CLKS value in the BRGx configuration register. See + the mv64x60 User's Manual. + - clock-frequence : The frequency (in Hz) of the baud rate + generator's input clock. + - current-speed : The current speed setting (presumably by + firmware) of the baud rate generator. + + Example Discovery BRG node: + brg@b200 { + compatible = "marvell,mv64360-brg"; + reg = <0xb200 0x8>; + clock-src = <8>; + clock-frequency = <133333333>; + current-speed = <9600>; + }; + + + f) Marvell Discovery CUNIT nodes + + Represent the Serial Communications Unit device hardware. + + Required properties: + - reg : Offset and length of the register set for this device + + Example Discovery CUNIT node: + cunit@f200 { + reg = <0xf200 0x200>; + }; + + + g) Marvell Discovery MPSCROUTING nodes + + Represent the Discovery's MPSC routing hardware + + Required properties: + - reg : Offset and length of the register set for this device + + Example Discovery CUNIT node: + mpscrouting@b500 { + reg = <0xb400 0xc>; + }; + + + h) Marvell Discovery MPSCINTR nodes + + Represent the Discovery's MPSC DMA interrupt hardware registers + (SDMA cause and mask registers). + + Required properties: + - reg : Offset and length of the register set for this device + + Example Discovery MPSCINTR node: + mpsintr@b800 { + reg = <0xb800 0x100>; + }; + + + i) Marvell Discovery MPSC nodes + + Represent the Discovery's MPSC (Multiprotocol Serial Controller) + serial port. + + Required properties: + - device_type : "serial" + - compatible : "marvell,mv64360-mpsc" + - reg : Offset and length of the register set for this device + - sdma : the phandle for the SDMA node used by this port + - brg : the phandle for the BRG node used by this port + - cunit : the phandle for the CUNIT node used by this port + - mpscrouting : the phandle for the MPSCROUTING node used by this port + - mpscintr : the phandle for the MPSCINTR node used by this port + - cell-index : the hardware index of this cell in the MPSC core + - max_idle : value needed for MPSC CHR3 (Maximum Frame Length) + register + - interrupts : where a is the interrupt number for the MPSC. + - interrupt-parent : the phandle for the interrupt controller + that services interrupts for this device. + + Example Discovery MPSCINTR node: + mpsc@8000 { + device_type = "serial"; + compatible = "marvell,mv64360-mpsc"; + reg = <0x8000 0x38>; + virtual-reg = <0xf1008000>; + sdma = <&SDMA0>; + brg = <&BRG0>; + cunit = <&CUNIT>; + mpscrouting = <&MPSCROUTING>; + mpscintr = <&MPSCINTR>; + cell-index = <0>; + max_idle = <40>; + interrupts = <40>; + interrupt-parent = <&PIC>; + }; + + + j) Marvell Discovery Watch Dog Timer nodes + + Represent the Discovery's watchdog timer hardware + + Required properties: + - compatible : "marvell,mv64360-wdt" + - reg : Offset and length of the register set for this device + + Example Discovery Watch Dog Timer node: + wdt@b410 { + compatible = "marvell,mv64360-wdt"; + reg = <0xb410 0x8>; + }; + + + k) Marvell Discovery I2C nodes + + Represent the Discovery's I2C hardware + + Required properties: + - device_type : "i2c" + - compatible : "marvell,mv64360-i2c" + - reg : Offset and length of the register set for this device + - interrupts : where a is the interrupt number for the I2C. + - interrupt-parent : the phandle for the interrupt controller + that services interrupts for this device. + + Example Discovery I2C node: + compatible = "marvell,mv64360-i2c"; + reg = <0xc000 0x20>; + virtual-reg = <0xf100c000>; + interrupts = <37>; + interrupt-parent = <&PIC>; + }; + + + l) Marvell Discovery PIC (Programmable Interrupt Controller) nodes + + Represent the Discovery's PIC hardware + + Required properties: + - #interrupt-cells : <1> + - #address-cells : <0> + - compatible : "marvell,mv64360-pic" + - reg : Offset and length of the register set for this device + - interrupt-controller + + Example Discovery PIC node: + pic { + #interrupt-cells = <1>; + #address-cells = <0>; + compatible = "marvell,mv64360-pic"; + reg = <0x0 0x88>; + interrupt-controller; + }; + + + m) Marvell Discovery MPP (Multipurpose Pins) multiplexing nodes + + Represent the Discovery's MPP hardware + + Required properties: + - compatible : "marvell,mv64360-mpp" + - reg : Offset and length of the register set for this device + + Example Discovery MPP node: + mpp@f000 { + compatible = "marvell,mv64360-mpp"; + reg = <0xf000 0x10>; + }; + + + n) Marvell Discovery GPP (General Purpose Pins) nodes + + Represent the Discovery's GPP hardware + + Required properties: + - compatible : "marvell,mv64360-gpp" + - reg : Offset and length of the register set for this device + + Example Discovery GPP node: + gpp@f000 { + compatible = "marvell,mv64360-gpp"; + reg = <0xf100 0x20>; + }; + + + o) Marvell Discovery PCI host bridge node + + Represents the Discovery's PCI host bridge device. The properties + for this node conform to Rev 2.1 of the PCI Bus Binding to IEEE + 1275-1994. A typical value for the compatible property is + "marvell,mv64360-pci". + + Example Discovery PCI host bridge node + pci@80000000 { + #address-cells = <3>; + #size-cells = <2>; + #interrupt-cells = <1>; + device_type = "pci"; + compatible = "marvell,mv64360-pci"; + reg = <0xcf8 0x8>; + ranges = <0x01000000 0x0 0x0 + 0x88000000 0x0 0x01000000 + 0x02000000 0x0 0x80000000 + 0x80000000 0x0 0x08000000>; + bus-range = <0 255>; + clock-frequency = <66000000>; + interrupt-parent = <&PIC>; + interrupt-map-mask = <0xf800 0x0 0x0 0x7>; + interrupt-map = < + /* IDSEL 0x0a */ + 0x5000 0 0 1 &PIC 80 + 0x5000 0 0 2 &PIC 81 + 0x5000 0 0 3 &PIC 91 + 0x5000 0 0 4 &PIC 93 + + /* IDSEL 0x0b */ + 0x5800 0 0 1 &PIC 91 + 0x5800 0 0 2 &PIC 93 + 0x5800 0 0 3 &PIC 80 + 0x5800 0 0 4 &PIC 81 + + /* IDSEL 0x0c */ + 0x6000 0 0 1 &PIC 91 + 0x6000 0 0 2 &PIC 93 + 0x6000 0 0 3 &PIC 80 + 0x6000 0 0 4 &PIC 81 + + /* IDSEL 0x0d */ + 0x6800 0 0 1 &PIC 93 + 0x6800 0 0 2 &PIC 80 + 0x6800 0 0 3 &PIC 81 + 0x6800 0 0 4 &PIC 91 + >; + }; + + + p) Marvell Discovery CPU Error nodes + + Represent the Discovery's CPU error handler device. + + Required properties: + - compatible : "marvell,mv64360-cpu-error" + - reg : Offset and length of the register set for this device + - interrupts : the interrupt number for this device + - interrupt-parent : the phandle for the interrupt controller + that services interrupts for this device. + + Example Discovery CPU Error node: + cpu-error@0070 { + compatible = "marvell,mv64360-cpu-error"; + reg = <0x70 0x10 0x128 0x28>; + interrupts = <3>; + interrupt-parent = <&PIC>; + }; + + + q) Marvell Discovery SRAM Controller nodes + + Represent the Discovery's SRAM controller device. + + Required properties: + - compatible : "marvell,mv64360-sram-ctrl" + - reg : Offset and length of the register set for this device + - interrupts : the interrupt number for this device + - interrupt-parent : the phandle for the interrupt controller + that services interrupts for this device. + + Example Discovery SRAM Controller node: + sram-ctrl@0380 { + compatible = "marvell,mv64360-sram-ctrl"; + reg = <0x380 0x80>; + interrupts = <13>; + interrupt-parent = <&PIC>; + }; + + + r) Marvell Discovery PCI Error Handler nodes + + Represent the Discovery's PCI error handler device. + + Required properties: + - compatible : "marvell,mv64360-pci-error" + - reg : Offset and length of the register set for this device + - interrupts : the interrupt number for this device + - interrupt-parent : the phandle for the interrupt controller + that services interrupts for this device. + + Example Discovery PCI Error Handler node: + pci-error@1d40 { + compatible = "marvell,mv64360-pci-error"; + reg = <0x1d40 0x40 0xc28 0x4>; + interrupts = <12>; + interrupt-parent = <&PIC>; + }; + + + s) Marvell Discovery Memory Controller nodes + + Represent the Discovery's memory controller device. + + Required properties: + - compatible : "marvell,mv64360-mem-ctrl" + - reg : Offset and length of the register set for this device + - interrupts : the interrupt number for this device + - interrupt-parent : the phandle for the interrupt controller + that services interrupts for this device. + + Example Discovery Memory Controller node: + mem-ctrl@1400 { + compatible = "marvell,mv64360-mem-ctrl"; + reg = <0x1400 0x60>; + interrupts = <17>; + interrupt-parent = <&PIC>; + }; + + diff --git a/Documentation/powerpc/dts-bindings/phy.txt b/Documentation/powerpc/dts-bindings/phy.txt new file mode 100644 index 000000000000..bb8c742eb8c5 --- /dev/null +++ b/Documentation/powerpc/dts-bindings/phy.txt @@ -0,0 +1,25 @@ +PHY nodes + +Required properties: + + - device_type : Should be "ethernet-phy" + - interrupts : where a is the interrupt number and b is a + field that represents an encoding of the sense and level + information for the interrupt. This should be encoded based on + the information in section 2) depending on the type of interrupt + controller you have. + - interrupt-parent : the phandle for the interrupt controller that + services interrupts for this device. + - reg : The ID number for the phy, usually a small integer + - linux,phandle : phandle for this node; likely referenced by an + ethernet controller node. + +Example: + +ethernet-phy@0 { + linux,phandle = <2452000> + interrupt-parent = <40000>; + interrupts = <35 1>; + reg = <0>; + device_type = "ethernet-phy"; +}; diff --git a/Documentation/powerpc/dts-bindings/spi-bus.txt b/Documentation/powerpc/dts-bindings/spi-bus.txt new file mode 100644 index 000000000000..e782add2e457 --- /dev/null +++ b/Documentation/powerpc/dts-bindings/spi-bus.txt @@ -0,0 +1,57 @@ +SPI (Serial Peripheral Interface) busses + +SPI busses can be described with a node for the SPI master device +and a set of child nodes for each SPI slave on the bus. For this +discussion, it is assumed that the system's SPI controller is in +SPI master mode. This binding does not describe SPI controllers +in slave mode. + +The SPI master node requires the following properties: +- #address-cells - number of cells required to define a chip select + address on the SPI bus. +- #size-cells - should be zero. +- compatible - name of SPI bus controller following generic names + recommended practice. +No other properties are required in the SPI bus node. It is assumed +that a driver for an SPI bus device will understand that it is an SPI bus. +However, the binding does not attempt to define the specific method for +assigning chip select numbers. Since SPI chip select configuration is +flexible and non-standardized, it is left out of this binding with the +assumption that board specific platform code will be used to manage +chip selects. Individual drivers can define additional properties to +support describing the chip select layout. + +SPI slave nodes must be children of the SPI master node and can +contain the following properties. +- reg - (required) chip select address of device. +- compatible - (required) name of SPI device following generic names + recommended practice +- spi-max-frequency - (required) Maximum SPI clocking speed of device in Hz +- spi-cpol - (optional) Empty property indicating device requires + inverse clock polarity (CPOL) mode +- spi-cpha - (optional) Empty property indicating device requires + shifted clock phase (CPHA) mode +- spi-cs-high - (optional) Empty property indicating device requires + chip select active high + +SPI example for an MPC5200 SPI bus: + spi@f00 { + #address-cells = <1>; + #size-cells = <0>; + compatible = "fsl,mpc5200b-spi","fsl,mpc5200-spi"; + reg = <0xf00 0x20>; + interrupts = <2 13 0 2 14 0>; + interrupt-parent = <&mpc5200_pic>; + + ethernet-switch@0 { + compatible = "micrel,ks8995m"; + spi-max-frequency = <1000000>; + reg = <0>; + }; + + codec@1 { + compatible = "ti,tlv320aic26"; + spi-max-frequency = <100000>; + reg = <1>; + }; + }; diff --git a/Documentation/powerpc/dts-bindings/usb-ehci.txt b/Documentation/powerpc/dts-bindings/usb-ehci.txt new file mode 100644 index 000000000000..fa18612f757b --- /dev/null +++ b/Documentation/powerpc/dts-bindings/usb-ehci.txt @@ -0,0 +1,25 @@ +USB EHCI controllers + +Required properties: + - compatible : should be "usb-ehci". + - reg : should contain at least address and length of the standard EHCI + register set for the device. Optional platform-dependent registers + (debug-port or other) can be also specified here, but only after + definition of standard EHCI registers. + - interrupts : one EHCI interrupt should be described here. +If device registers are implemented in big endian mode, the device +node should have "big-endian-regs" property. +If controller implementation operates with big endian descriptors, +"big-endian-desc" property should be specified. +If both big endian registers and descriptors are used by the controller +implementation, "big-endian" property can be specified instead of having +both "big-endian-regs" and "big-endian-desc". + +Example (Sequoia 440EPx): + ehci@e0000300 { + compatible = "ibm,usb-ehci-440epx", "usb-ehci"; + interrupt-parent = <&UIC0>; + interrupts = <1a 4>; + reg = <0 e0000300 90 0 e0000390 70>; + big-endian; + }; diff --git a/Documentation/powerpc/dts-bindings/xilinx.txt b/Documentation/powerpc/dts-bindings/xilinx.txt new file mode 100644 index 000000000000..80339fe4300b --- /dev/null +++ b/Documentation/powerpc/dts-bindings/xilinx.txt @@ -0,0 +1,295 @@ + d) Xilinx IP cores + + The Xilinx EDK toolchain ships with a set of IP cores (devices) for use + in Xilinx Spartan and Virtex FPGAs. The devices cover the whole range + of standard device types (network, serial, etc.) and miscellaneous + devices (gpio, LCD, spi, etc). Also, since these devices are + implemented within the fpga fabric every instance of the device can be + synthesised with different options that change the behaviour. + + Each IP-core has a set of parameters which the FPGA designer can use to + control how the core is synthesized. Historically, the EDK tool would + extract the device parameters relevant to device drivers and copy them + into an 'xparameters.h' in the form of #define symbols. This tells the + device drivers how the IP cores are configured, but it requres the kernel + to be recompiled every time the FPGA bitstream is resynthesized. + + The new approach is to export the parameters into the device tree and + generate a new device tree each time the FPGA bitstream changes. The + parameters which used to be exported as #defines will now become + properties of the device node. In general, device nodes for IP-cores + will take the following form: + + (name): (generic-name)@(base-address) { + compatible = "xlnx,(ip-core-name)-(HW_VER)" + [, (list of compatible devices), ...]; + reg = <(baseaddr) (size)>; + interrupt-parent = <&interrupt-controller-phandle>; + interrupts = < ... >; + xlnx,(parameter1) = "(string-value)"; + xlnx,(parameter2) = <(int-value)>; + }; + + (generic-name): an open firmware-style name that describes the + generic class of device. Preferably, this is one word, such + as 'serial' or 'ethernet'. + (ip-core-name): the name of the ip block (given after the BEGIN + directive in system.mhs). Should be in lowercase + and all underscores '_' converted to dashes '-'. + (name): is derived from the "PARAMETER INSTANCE" value. + (parameter#): C_* parameters from system.mhs. The C_ prefix is + dropped from the parameter name, the name is converted + to lowercase and all underscore '_' characters are + converted to dashes '-'. + (baseaddr): the baseaddr parameter value (often named C_BASEADDR). + (HW_VER): from the HW_VER parameter. + (size): the address range size (often C_HIGHADDR - C_BASEADDR + 1). + + Typically, the compatible list will include the exact IP core version + followed by an older IP core version which implements the same + interface or any other device with the same interface. + + 'reg', 'interrupt-parent' and 'interrupts' are all optional properties. + + For example, the following block from system.mhs: + + BEGIN opb_uartlite + PARAMETER INSTANCE = opb_uartlite_0 + PARAMETER HW_VER = 1.00.b + PARAMETER C_BAUDRATE = 115200 + PARAMETER C_DATA_BITS = 8 + PARAMETER C_ODD_PARITY = 0 + PARAMETER C_USE_PARITY = 0 + PARAMETER C_CLK_FREQ = 50000000 + PARAMETER C_BASEADDR = 0xEC100000 + PARAMETER C_HIGHADDR = 0xEC10FFFF + BUS_INTERFACE SOPB = opb_7 + PORT OPB_Clk = CLK_50MHz + PORT Interrupt = opb_uartlite_0_Interrupt + PORT RX = opb_uartlite_0_RX + PORT TX = opb_uartlite_0_TX + PORT OPB_Rst = sys_bus_reset_0 + END + + becomes the following device tree node: + + opb_uartlite_0: serial@ec100000 { + device_type = "serial"; + compatible = "xlnx,opb-uartlite-1.00.b"; + reg = ; + interrupt-parent = <&opb_intc_0>; + interrupts = <1 0>; // got this from the opb_intc parameters + current-speed = ; // standard serial device prop + clock-frequency = ; // standard serial device prop + xlnx,data-bits = <8>; + xlnx,odd-parity = <0>; + xlnx,use-parity = <0>; + }; + + Some IP cores actually implement 2 or more logical devices. In + this case, the device should still describe the whole IP core with + a single node and add a child node for each logical device. The + ranges property can be used to translate from parent IP-core to the + registers of each device. In addition, the parent node should be + compatible with the bus type 'xlnx,compound', and should contain + #address-cells and #size-cells, as with any other bus. (Note: this + makes the assumption that both logical devices have the same bus + binding. If this is not true, then separate nodes should be used + for each logical device). The 'cell-index' property can be used to + enumerate logical devices within an IP core. For example, the + following is the system.mhs entry for the dual ps2 controller found + on the ml403 reference design. + + BEGIN opb_ps2_dual_ref + PARAMETER INSTANCE = opb_ps2_dual_ref_0 + PARAMETER HW_VER = 1.00.a + PARAMETER C_BASEADDR = 0xA9000000 + PARAMETER C_HIGHADDR = 0xA9001FFF + BUS_INTERFACE SOPB = opb_v20_0 + PORT Sys_Intr1 = ps2_1_intr + PORT Sys_Intr2 = ps2_2_intr + PORT Clkin1 = ps2_clk_rx_1 + PORT Clkin2 = ps2_clk_rx_2 + PORT Clkpd1 = ps2_clk_tx_1 + PORT Clkpd2 = ps2_clk_tx_2 + PORT Rx1 = ps2_d_rx_1 + PORT Rx2 = ps2_d_rx_2 + PORT Txpd1 = ps2_d_tx_1 + PORT Txpd2 = ps2_d_tx_2 + END + + It would result in the following device tree nodes: + + opb_ps2_dual_ref_0: opb-ps2-dual-ref@a9000000 { + #address-cells = <1>; + #size-cells = <1>; + compatible = "xlnx,compound"; + ranges = <0 a9000000 2000>; + // If this device had extra parameters, then they would + // go here. + ps2@0 { + compatible = "xlnx,opb-ps2-dual-ref-1.00.a"; + reg = <0 40>; + interrupt-parent = <&opb_intc_0>; + interrupts = <3 0>; + cell-index = <0>; + }; + ps2@1000 { + compatible = "xlnx,opb-ps2-dual-ref-1.00.a"; + reg = <1000 40>; + interrupt-parent = <&opb_intc_0>; + interrupts = <3 0>; + cell-index = <0>; + }; + }; + + Also, the system.mhs file defines bus attachments from the processor + to the devices. The device tree structure should reflect the bus + attachments. Again an example; this system.mhs fragment: + + BEGIN ppc405_virtex4 + PARAMETER INSTANCE = ppc405_0 + PARAMETER HW_VER = 1.01.a + BUS_INTERFACE DPLB = plb_v34_0 + BUS_INTERFACE IPLB = plb_v34_0 + END + + BEGIN opb_intc + PARAMETER INSTANCE = opb_intc_0 + PARAMETER HW_VER = 1.00.c + PARAMETER C_BASEADDR = 0xD1000FC0 + PARAMETER C_HIGHADDR = 0xD1000FDF + BUS_INTERFACE SOPB = opb_v20_0 + END + + BEGIN opb_uart16550 + PARAMETER INSTANCE = opb_uart16550_0 + PARAMETER HW_VER = 1.00.d + PARAMETER C_BASEADDR = 0xa0000000 + PARAMETER C_HIGHADDR = 0xa0001FFF + BUS_INTERFACE SOPB = opb_v20_0 + END + + BEGIN plb_v34 + PARAMETER INSTANCE = plb_v34_0 + PARAMETER HW_VER = 1.02.a + END + + BEGIN plb_bram_if_cntlr + PARAMETER INSTANCE = plb_bram_if_cntlr_0 + PARAMETER HW_VER = 1.00.b + PARAMETER C_BASEADDR = 0xFFFF0000 + PARAMETER C_HIGHADDR = 0xFFFFFFFF + BUS_INTERFACE SPLB = plb_v34_0 + END + + BEGIN plb2opb_bridge + PARAMETER INSTANCE = plb2opb_bridge_0 + PARAMETER HW_VER = 1.01.a + PARAMETER C_RNG0_BASEADDR = 0x20000000 + PARAMETER C_RNG0_HIGHADDR = 0x3FFFFFFF + PARAMETER C_RNG1_BASEADDR = 0x60000000 + PARAMETER C_RNG1_HIGHADDR = 0x7FFFFFFF + PARAMETER C_RNG2_BASEADDR = 0x80000000 + PARAMETER C_RNG2_HIGHADDR = 0xBFFFFFFF + PARAMETER C_RNG3_BASEADDR = 0xC0000000 + PARAMETER C_RNG3_HIGHADDR = 0xDFFFFFFF + BUS_INTERFACE SPLB = plb_v34_0 + BUS_INTERFACE MOPB = opb_v20_0 + END + + Gives this device tree (some properties removed for clarity): + + plb@0 { + #address-cells = <1>; + #size-cells = <1>; + compatible = "xlnx,plb-v34-1.02.a"; + device_type = "ibm,plb"; + ranges; // 1:1 translation + + plb_bram_if_cntrl_0: bram@ffff0000 { + reg = ; + } + + opb@20000000 { + #address-cells = <1>; + #size-cells = <1>; + ranges = <20000000 20000000 20000000 + 60000000 60000000 20000000 + 80000000 80000000 40000000 + c0000000 c0000000 20000000>; + + opb_uart16550_0: serial@a0000000 { + reg = ; + }; + + opb_intc_0: interrupt-controller@d1000fc0 { + reg = ; + }; + }; + }; + + That covers the general approach to binding xilinx IP cores into the + device tree. The following are bindings for specific devices: + + i) Xilinx ML300 Framebuffer + + Simple framebuffer device from the ML300 reference design (also on the + ML403 reference design as well as others). + + Optional properties: + - resolution = : pixel resolution of framebuffer. Some + implementations use a different resolution. + Default is + - virt-resolution = : Size of framebuffer in memory. + Default is . + - rotate-display (empty) : rotate display 180 degrees. + + ii) Xilinx SystemACE + + The Xilinx SystemACE device is used to program FPGAs from an FPGA + bitstream stored on a CF card. It can also be used as a generic CF + interface device. + + Optional properties: + - 8-bit (empty) : Set this property for SystemACE in 8 bit mode + + iii) Xilinx EMAC and Xilinx TEMAC + + Xilinx Ethernet devices. In addition to general xilinx properties + listed above, nodes for these devices should include a phy-handle + property, and may include other common network device properties + like local-mac-address. + + iv) Xilinx Uartlite + + Xilinx uartlite devices are simple fixed speed serial ports. + + Required properties: + - current-speed : Baud rate of uartlite + + v) Xilinx hwicap + + Xilinx hwicap devices provide access to the configuration logic + of the FPGA through the Internal Configuration Access Port + (ICAP). The ICAP enables partial reconfiguration of the FPGA, + readback of the configuration information, and some control over + 'warm boots' of the FPGA fabric. + + Required properties: + - xlnx,family : The family of the FPGA, necessary since the + capabilities of the underlying ICAP hardware + differ between different families. May be + 'virtex2p', 'virtex4', or 'virtex5'. + + vi) Xilinx Uart 16550 + + Xilinx UART 16550 devices are very similar to the NS16550 but with + different register spacing and an offset from the base address. + + Required properties: + - clock-frequency : Frequency of the clock input + - reg-offset : A value of 3 is required + - reg-shift : A value of 2 is required + + -- cgit v1.2.2 From 5054d39e327f76df022163a2ebd02e444c5d65f9 Mon Sep 17 00:00:00 2001 From: Antonio Ospite Date: Fri, 19 Jun 2009 13:55:42 +0200 Subject: leds: LED driver for National Semiconductor LP3944 Funlight Chip LEDs driver for National Semiconductor LP3944 Funlight Chip http://www.national.com/pf/LP/LP3944.html This helper chip can drive up to 8 leds, with two programmable DIM modes; it could even be used as a gpio expander but this driver assumes it is used as a led controller. The DIM modes are used to set _blink_ patterns for leds, the pattern is specified supplying two parameters: - period: from 0s to 1.6s - duty cycle: percentage of the period the led is on, from 0 to 100 LP3944 can be found on Motorola A910 smartphone, where it drives the rgb leds, the camera flash light and the displays backlights. Signed-off-by: Antonio Ospite Signed-off-by: Richard Purdie --- Documentation/leds-lp3944.txt | 50 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 50 insertions(+) create mode 100644 Documentation/leds-lp3944.txt (limited to 'Documentation') diff --git a/Documentation/leds-lp3944.txt b/Documentation/leds-lp3944.txt new file mode 100644 index 000000000000..c6eda18b15ef --- /dev/null +++ b/Documentation/leds-lp3944.txt @@ -0,0 +1,50 @@ +Kernel driver lp3944 +==================== + + * National Semiconductor LP3944 Fun-light Chip + Prefix: 'lp3944' + Addresses scanned: None (see the Notes section below) + Datasheet: Publicly available at the National Semiconductor website + http://www.national.com/pf/LP/LP3944.html + +Authors: + Antonio Ospite + + +Description +----------- +The LP3944 is a helper chip that can drive up to 8 leds, with two programmable +DIM modes; it could even be used as a gpio expander but this driver assumes it +is used as a led controller. + +The DIM modes are used to set _blink_ patterns for leds, the pattern is +specified supplying two parameters: + - period: from 0s to 1.6s + - duty cycle: percentage of the period the led is on, from 0 to 100 + +Setting a led in DIM0 or DIM1 mode makes it blink according to the pattern. +See the datasheet for details. + +LP3944 can be found on Motorola A910 smartphone, where it drives the rgb +leds, the camera flash light and the lcds power. + + +Notes +----- +The chip is used mainly in embedded contexts, so this driver expects it is +registered using the i2c_board_info mechanism. + +To register the chip at address 0x60 on adapter 0, set the platform data +according to include/linux/leds-lp3944.h, set the i2c board info: + + static struct i2c_board_info __initdata a910_i2c_board_info[] = { + { + I2C_BOARD_INFO("lp3944", 0x60), + .platform_data = &a910_lp3944_leds, + }, + }; + +and register it in the platform init function + + i2c_register_board_info(0, a910_i2c_board_info, + ARRAY_SIZE(a910_i2c_board_info)); -- cgit v1.2.2 From ed88bae6918fa990cbfe47316bd0f790121aaf00 Mon Sep 17 00:00:00 2001 From: Trent Piepho Date: Tue, 12 May 2009 15:33:12 -0700 Subject: leds: Add options to have GPIO LEDs start on or keep their state There already is a "default-on" trigger but there are problems with it. For one, it's a inefficient way to do it and requires led trigger support to be compiled in. But the real reason is that is produces a glitch on the LED. The GPIO is allocate with the LED *off*, then *later* when the trigger runs it is turned back on. If the LED was already on via the GPIO's reset default or action of the firmware, this produces a glitch where the LED goes from on to off to on. While normally this is fast enough that it wouldn't be noticeable to a human observer, there are still serious problems. One is that there may be something else on the GPIO line, like a hardware alarm or watchdog, that is fast enough to notice the glitch. Another is that the kernel may panic before the LED is turned back on, thus hanging with the LED in the wrong state. This is not just speculation, but actually happened to me with an embedded system that has an LED which should turn off when the kernel finishes booting, which was left in the incorrect state due to a bug in the OF LED binding code. We also let GPIO LEDs get their initial value from whatever the current state of the GPIO line is. On some systems the LEDs are put into some state by the firmware or hardware before Linux boots, and it is desired to have them keep this state which is otherwise unknown to Linux. This requires that the underlying GPIO driver support reading the value of output GPIOs. Some drivers support this and some do not. The platform device binding gains a field in the platform data "default_state" that controls this. There are three constants defined to select from on, off, or keeping the current state. The OpenFirmware binding uses a property named "default-state" that can be set to "on", "off", or "keep". The default if the property isn't present is off. Signed-off-by: Trent Piepho Acked-by: Grant Likely Acked-by: Wolfram Sang Acked-by: Sean MacLennan Signed-off-by: Richard Purdie --- Documentation/powerpc/dts-bindings/gpio/led.txt | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/powerpc/dts-bindings/gpio/led.txt b/Documentation/powerpc/dts-bindings/gpio/led.txt index 4fe14deedc0a..064db928c3c1 100644 --- a/Documentation/powerpc/dts-bindings/gpio/led.txt +++ b/Documentation/powerpc/dts-bindings/gpio/led.txt @@ -16,10 +16,17 @@ LED sub-node properties: string defining the trigger assigned to the LED. Current triggers are: "backlight" - LED will act as a back-light, controlled by the framebuffer system - "default-on" - LED will turn on + "default-on" - LED will turn on, but see "default-state" below "heartbeat" - LED "double" flashes at a load average based rate "ide-disk" - LED indicates disk activity "timer" - LED flashes at a fixed, configurable rate +- default-state: (optional) The initial state of the LED. Valid + values are "on", "off", and "keep". If the LED is already on or off + and the default-state property is set the to same value, then no + glitch should be produced where the LED momentarily turns off (or + on). The "keep" setting will keep the LED at whatever its current + state is, without producing a glitch. The default is off if this + property is not present. Examples: @@ -30,14 +37,22 @@ leds { gpios = <&mcu_pio 0 1>; /* Active low */ linux,default-trigger = "ide-disk"; }; + + fault { + gpios = <&mcu_pio 1 0>; + /* Keep LED on if BIOS detected hardware fault */ + default-state = "keep"; + }; }; run-control { compatible = "gpio-leds"; red { gpios = <&mpc8572 6 0>; + default-state = "off"; }; green { gpios = <&mpc8572 7 0>; + default-state = "on"; }; } -- cgit v1.2.2 From 9d612beff5089b89a295a2331883a8ce3fff08c1 Mon Sep 17 00:00:00 2001 From: Li Zefan Date: Wed, 24 Jun 2009 17:33:15 +0800 Subject: tracing: Fix trace_buf_size boot option We should be able to specify [KMG] when setting trace_buf_size boot option, as documented in kernel-parameters.txt Signed-off-by: Li Zefan Cc: Steven Rostedt Cc: Frederic Weisbecker LKML-Reference: <4A41F2DB.4020102@cn.fujitsu.com> Signed-off-by: Ingo Molnar --- Documentation/kernel-parameters.txt | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 92e1ab8178a8..d3f41db3ed49 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -2475,7 +2475,8 @@ and is between 256 and 4096 characters. It is defined in the file tp720= [HW,PS2] - trace_buf_size=nn[KMG] [ftrace] will set tracing buffer size. + trace_buf_size=nn[KMG] + [FTRACE] will set tracing buffer size. trix= [HW,OSS] MediaTrix AudioTrix Pro Format: -- cgit v1.2.2 From c912e7a58054304575fe88574c776be7e684098e Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Wed, 24 Jun 2009 14:14:34 +0200 Subject: ALSA: hda - Fix support for Samsung P50 with AD1986A codec Samsung P50 requires the HP auto-muting unlike other Samsung models. Added a new model=samsung-p50 to support this. Signed-off-by: Takashi Iwai --- Documentation/sound/alsa/HD-Audio-Models.txt | 1 + 1 file changed, 1 insertion(+) (limited to 'Documentation') diff --git a/Documentation/sound/alsa/HD-Audio-Models.txt b/Documentation/sound/alsa/HD-Audio-Models.txt index 0d8d23581c44..939a3dd58148 100644 --- a/Documentation/sound/alsa/HD-Audio-Models.txt +++ b/Documentation/sound/alsa/HD-Audio-Models.txt @@ -240,6 +240,7 @@ AD1986A laptop-automute 2-channel with EAPD and HP-automute (Lenovo N100) ultra 2-channel with EAPD (Samsung Ultra tablet PC) samsung 2-channel with EAPD (Samsung R65) + samsung-p50 2-channel with HP-automute (Samsung P50) AD1988/AD1988B/AD1989A/AD1989B ============================== -- cgit v1.2.2 From a9d9058abab4ac17b79d500506e6c74bd16cecdc Mon Sep 17 00:00:00 2001 From: Catalin Marinas Date: Thu, 25 Jun 2009 10:16:11 +0100 Subject: kmemleak: Allow the early log buffer to be configurable. (feature suggested by Sergey Senozhatsky) Kmemleak needs to track all the memory allocations but some of these happen before kmemleak is initialised. These are stored in an internal buffer which may be exceeded in some kernel configurations. This patch adds a configuration option with a default value of 400 and also removes the stack dump when the early log buffer is exceeded. Signed-off-by: Catalin Marinas Acked-by: Sergey Senozhatsky --- Documentation/kmemleak.txt | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'Documentation') diff --git a/Documentation/kmemleak.txt b/Documentation/kmemleak.txt index 0112da3b9ab8..f655308064d7 100644 --- a/Documentation/kmemleak.txt +++ b/Documentation/kmemleak.txt @@ -41,6 +41,10 @@ Memory scanning parameters can be modified at run-time by writing to the Kmemleak can also be disabled at boot-time by passing "kmemleak=off" on the kernel command line. +Memory may be allocated or freed before kmemleak is initialised and +these actions are stored in an early log buffer. The size of this buffer +is configured via the CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE option. + Basic Algorithm --------------- -- cgit v1.2.2 From e0a2a1601bec01243bcad44414d06f59dae2eedb Mon Sep 17 00:00:00 2001 From: Catalin Marinas Date: Fri, 26 Jun 2009 17:38:25 +0100 Subject: kmemleak: Enable task stacks scanning by default This is to reduce the number of false positives reported. Signed-off-by: Catalin Marinas --- Documentation/kmemleak.txt | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'Documentation') diff --git a/Documentation/kmemleak.txt b/Documentation/kmemleak.txt index f655308064d7..9426e94f291a 100644 --- a/Documentation/kmemleak.txt +++ b/Documentation/kmemleak.txt @@ -31,12 +31,12 @@ Memory scanning parameters can be modified at run-time by writing to the /sys/kernel/debug/kmemleak file. The following parameters are supported: off - disable kmemleak (irreversible) - stack=on - enable the task stacks scanning + stack=on - enable the task stacks scanning (default) stack=off - disable the tasks stacks scanning - scan=on - start the automatic memory scanning thread + scan=on - start the automatic memory scanning thread (default) scan=off - stop the automatic memory scanning thread - scan= - set the automatic memory scanning period in seconds (0 - to disable it) + scan= - set the automatic memory scanning period in seconds + (default 600, 0 to stop the automatic scanning) Kmemleak can also be disabled at boot-time by passing "kmemleak=off" on the kernel command line. -- cgit v1.2.2 From bab4a34afc301fdb81b6ea0e3098d96fc356e03a Mon Sep 17 00:00:00 2001 From: Catalin Marinas Date: Fri, 26 Jun 2009 17:38:26 +0100 Subject: kmemleak: Simplify the reports logged by the scanning thread Because of false positives, the memory scanning thread may print too much information. This patch changes the scanning thread to only print the number of newly suspected leaks. Further information can be read from the /sys/kernel/debug/kmemleak file. Signed-off-by: Catalin Marinas --- Documentation/kmemleak.txt | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'Documentation') diff --git a/Documentation/kmemleak.txt b/Documentation/kmemleak.txt index 9426e94f291a..c06f7ba64993 100644 --- a/Documentation/kmemleak.txt +++ b/Documentation/kmemleak.txt @@ -16,9 +16,9 @@ Usage ----- CONFIG_DEBUG_KMEMLEAK in "Kernel hacking" has to be enabled. A kernel -thread scans the memory every 10 minutes (by default) and prints any new -unreferenced objects found. To trigger an intermediate scan and display -all the possible memory leaks: +thread scans the memory every 10 minutes (by default) and prints the +number of new unreferenced objects found. To trigger an intermediate +scan and display the details of all the possible memory leaks: # mount -t debugfs nodev /sys/kernel/debug/ # cat /sys/kernel/debug/kmemleak -- cgit v1.2.2 From 4698c1f2bbe44ce852ef1a6716973c1f5401a4c4 Mon Sep 17 00:00:00 2001 From: Catalin Marinas Date: Fri, 26 Jun 2009 17:38:27 +0100 Subject: kmemleak: Do not trigger a scan when reading the debug/kmemleak file Since there is a kernel thread for automatically scanning the memory, it makes sense for the debug/kmemleak file to only show its findings. This patch also adds support for "echo scan > debug/kmemleak" to trigger an intermediate memory scan and eliminates the kmemleak_mutex (scan_mutex covers all the cases now). Signed-off-by: Catalin Marinas --- Documentation/kmemleak.txt | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/kmemleak.txt b/Documentation/kmemleak.txt index c06f7ba64993..89068030b01b 100644 --- a/Documentation/kmemleak.txt +++ b/Documentation/kmemleak.txt @@ -17,12 +17,16 @@ Usage CONFIG_DEBUG_KMEMLEAK in "Kernel hacking" has to be enabled. A kernel thread scans the memory every 10 minutes (by default) and prints the -number of new unreferenced objects found. To trigger an intermediate -scan and display the details of all the possible memory leaks: +number of new unreferenced objects found. To display the details of all +the possible memory leaks: # mount -t debugfs nodev /sys/kernel/debug/ # cat /sys/kernel/debug/kmemleak +To trigger an intermediate memory scan: + + # echo scan > /sys/kernel/debug/kmemleak + Note that the orphan objects are listed in the order they were allocated and one object at the beginning of the list may cause other subsequent objects to be reported as orphan. @@ -37,6 +41,7 @@ Memory scanning parameters can be modified at run-time by writing to the scan=off - stop the automatic memory scanning thread scan= - set the automatic memory scanning period in seconds (default 600, 0 to stop the automatic scanning) + scan - trigger a memory scan Kmemleak can also be disabled at boot-time by passing "kmemleak=off" on the kernel command line. -- cgit v1.2.2 From 972c71a3183ab41c0b1a9e50842be7e3e980954f Mon Sep 17 00:00:00 2001 From: Peter Oberparleiter Date: Tue, 30 Jun 2009 11:41:20 -0700 Subject: gcov: fix documentation Commonly available versions of cp and tar don't work well with special files created using seq_file. Mention this problem in the gcov documentation and update the helper script example to work around these problems. Signed-off-by: Peter Oberparleiter Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/gcov.txt | 25 ++++++++++++++++--------- 1 file changed, 16 insertions(+), 9 deletions(-) (limited to 'Documentation') diff --git a/Documentation/gcov.txt b/Documentation/gcov.txt index e716aadb3a33..40ec63352760 100644 --- a/Documentation/gcov.txt +++ b/Documentation/gcov.txt @@ -188,13 +188,18 @@ Solution: Exclude affected source files from profiling by specifying GCOV_PROFILE := n or GCOV_PROFILE_basename.o := n in the corresponding Makefile. +Problem: Files copied from sysfs appear empty or incomplete. +Cause: Due to the way seq_file works, some tools such as cp or tar + may not correctly copy files from sysfs. +Solution: Use 'cat' to read .gcda files and 'cp -d' to copy links. + Alternatively use the mechanism shown in Appendix B. + Appendix A: gather_on_build.sh ============================== Sample script to gather coverage meta files on the build machine (see 6a): - #!/bin/bash KSRC=$1 @@ -226,7 +231,7 @@ Appendix B: gather_on_test.sh Sample script to gather coverage data files on the test machine (see 6b): -#!/bin/bash +#!/bin/bash -e DEST=$1 GCDA=/sys/kernel/debug/gcov @@ -236,11 +241,13 @@ if [ -z "$DEST" ] ; then exit 1 fi -find $GCDA -name '*.gcno' -o -name '*.gcda' | tar cfz $DEST -T - +TEMPDIR=$(mktemp -d) +echo Collecting data.. +find $GCDA -type d -exec mkdir -p $TEMPDIR/\{\} \; +find $GCDA -name '*.gcda' -exec sh -c 'cat < $0 > '$TEMPDIR'/$0' {} \; +find $GCDA -name '*.gcno' -exec sh -c 'cp -d $0 '$TEMPDIR'/$0' {} \; +tar czf $DEST -C $TEMPDIR sys +rm -rf $TEMPDIR -if [ $? -eq 0 ] ; then - echo "$DEST successfully created, copy to build system and unpack with:" - echo " tar xfz $DEST" -else - echo "Could not create file $DEST" -fi +echo "$DEST successfully created, copy to build system and unpack with:" +echo " tar xfz $DEST" -- cgit v1.2.2 From b55f627feeb9d48fdbde3835e18afbc76712e49b Mon Sep 17 00:00:00 2001 From: David Brownell Date: Tue, 30 Jun 2009 11:41:26 -0700 Subject: spi: new spi->mode bits Add two new spi_device.mode bits to accomodate more protocol options, and pass them through to usermode drivers: * SPI_NO_CS ... a second 3-wire variant, where the chipselect line is removed instead of a data line; transfers are still full duplex. This obviously has STRONG protocol implications since the chipselect transitions can't be used to synchronize state transitions with the SPI master. * SPI_READY ... defines open drain signal that's pulled low to pause the clock. This defines a 5-wire variant (normal 4-wire SPI plus READY) and two 4-wire variants (READY plus each of the 3-wire flavors). Such hardware flow control can be a big win. There are ADC converters and flash chips that expose READY signals, but not many host controllers support it today. The spi_bitbang code should be changed to use SPI_NO_CS instead of its current nonportable hack. That's a mode most hardware can easily support (unlike SPI_READY). Signed-off-by: David Brownell Cc: "Paulraj, Sandeep" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/spi/spidev_test.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/spi/spidev_test.c b/Documentation/spi/spidev_test.c index cf0e3ce0d526..c1a5aad3c75a 100644 --- a/Documentation/spi/spidev_test.c +++ b/Documentation/spi/spidev_test.c @@ -99,11 +99,13 @@ void parse_opts(int argc, char *argv[]) { "lsb", 0, 0, 'L' }, { "cs-high", 0, 0, 'C' }, { "3wire", 0, 0, '3' }, + { "no-cs", 0, 0, 'N' }, + { "ready", 0, 0, 'R' }, { NULL, 0, 0, 0 }, }; int c; - c = getopt_long(argc, argv, "D:s:d:b:lHOLC3", lopts, NULL); + c = getopt_long(argc, argv, "D:s:d:b:lHOLC3NR", lopts, NULL); if (c == -1) break; @@ -139,6 +141,12 @@ void parse_opts(int argc, char *argv[]) case '3': mode |= SPI_3WIRE; break; + case 'N': + mode |= SPI_NO_CS; + break; + case 'R': + mode |= SPI_READY; + break; default: print_usage(argv[0]); break; -- cgit v1.2.2 From b37f2d4de6dfce4bfd6df311af80e4d61458ee1e Mon Sep 17 00:00:00 2001 From: Nikanth Karthikesan Date: Tue, 30 Jun 2009 11:41:36 -0700 Subject: cpusets: document adding/removing cpus to cpuset elaborately By writing a tasks's pid to the file, a process adds that task to that cgroup/cpuset. But to add a cpu/mem to a cpuset, the new list of cpus should be written to the cpuset.mems file which would replace the old list of cpus. Make this clearer in the documentation. Signed-off-by: Nikanth Karthikesan Signed-off-by: Li Zefan Acked-by: Paul Menage Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/cgroups/cpusets.txt | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'Documentation') diff --git a/Documentation/cgroups/cpusets.txt b/Documentation/cgroups/cpusets.txt index f9ca389dddf4..1d7e9784439a 100644 --- a/Documentation/cgroups/cpusets.txt +++ b/Documentation/cgroups/cpusets.txt @@ -777,6 +777,18 @@ in cpuset directories: # /bin/echo 1-4 > cpus -> set cpus list to cpus 1,2,3,4 # /bin/echo 1,2,3,4 > cpus -> set cpus list to cpus 1,2,3,4 +To add a CPU to a cpuset, write the new list of CPUs including the +CPU to be added. To add 6 to the above cpuset: + +# /bin/echo 1-4,6 > cpus -> set cpus list to cpus 1,2,3,4,6 + +Similarly to remove a CPU from a cpuset, write the new list of CPUs +without the CPU to be removed. + +To remove all the CPUs: + +# /bin/echo "" > cpus -> clear cpus list + 2.3 Setting flags ----------------- -- cgit v1.2.2 From 61fd21670d048017c81e62f60894ef1b04b481db Mon Sep 17 00:00:00 2001 From: Andre Noll Date: Thu, 25 Jun 2009 13:03:04 +0200 Subject: Trivial typo fixes in Documentation/block/data-integrity.txt. Signed-off-by: Andre Noll Acked-by: Martin K. Petersen Signed-off-by: Jens Axboe --- Documentation/block/data-integrity.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/block/data-integrity.txt b/Documentation/block/data-integrity.txt index e8ca040ba2cf..2d735b0ae383 100644 --- a/Documentation/block/data-integrity.txt +++ b/Documentation/block/data-integrity.txt @@ -50,7 +50,7 @@ encouraged them to allow separation of the data and integrity metadata scatter-gather lists. The controller will interleave the buffers on write and split them on -read. This means that the Linux can DMA the data buffers to and from +read. This means that Linux can DMA the data buffers to and from host memory without changes to the page cache. Also, the 16-bit CRC checksum mandated by both the SCSI and SATA specs @@ -66,7 +66,7 @@ software RAID5). The IP checksum is weaker than the CRC in terms of detecting bit errors. However, the strength is really in the separation of the data -buffers and the integrity metadata. These two distinct buffers much +buffers and the integrity metadata. These two distinct buffers must match up for an I/O to complete. The separation of the data and integrity metadata buffers as well as -- cgit v1.2.2 From 9f38a920b232800fd4000ba3d4617b41198e017e Mon Sep 17 00:00:00 2001 From: Andy Walls Date: Fri, 3 Jul 2009 11:33:17 -0300 Subject: V4L/DVB (12181): get_dvb_firmware: Add Yuan MPC718 MT352 DVB-T "firmware" extraction Add routine to support extracting the MT352 DVB-T demodulator initialization sequence for Yuan MPC718 cards for use by the cx18 driver. This routine uses a hueristic for extracting a good sequence. It should work on all different versions of the "yuanrap.sys" file, given the way the MT352 tuning sequences are stored in all versions of that file I have seen so far. However, the current patch simply looks for one specific archive URL. Signed-off-by: Andy Walls Signed-off-by: Mauro Carvalho Chehab --- Documentation/dvb/get_dvb_firmware | 52 +++++++++++++++++++++++++++++++++++++- 1 file changed, 51 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/dvb/get_dvb_firmware b/Documentation/dvb/get_dvb_firmware index a52adfc9a57f..64174d6258f0 100644 --- a/Documentation/dvb/get_dvb_firmware +++ b/Documentation/dvb/get_dvb_firmware @@ -25,7 +25,7 @@ use IO::Handle; "tda10046lifeview", "av7110", "dec2000t", "dec2540t", "dec3000s", "vp7041", "dibusb", "nxt2002", "nxt2004", "or51211", "or51132_qam", "or51132_vsb", "bluebird", - "opera1", "cx231xx", "cx18", "cx23885", "pvrusb2" ); + "opera1", "cx231xx", "cx18", "cx23885", "pvrusb2", "mpc718" ); # Check args syntax() if (scalar(@ARGV) != 1); @@ -381,6 +381,56 @@ sub cx18 { $allfiles; } +sub mpc718 { + my $archive = 'Yuan MPC718 TV Tuner Card 2.13.10.1016.zip'; + my $url = "ftp://ftp.work.acer-euro.com/desktop/aspire_idea510/vista/Drivers/$archive"; + my $fwfile = "dvb-cx18-mpc718-mt352.fw"; + my $tmpdir = tempdir(DIR => "/tmp", CLEANUP => 1); + + checkstandard(); + wgetfile($archive, $url); + unzip($archive, $tmpdir); + + my $sourcefile = "$tmpdir/Yuan MPC718 TV Tuner Card 2.13.10.1016/mpc718_32bit/yuanrap.sys"; + my $found = 0; + + open IN, '<', $sourcefile or die "Couldn't open $sourcefile to extract $fwfile data\n"; + binmode IN; + open OUT, '>', $fwfile; + binmode OUT; + { + # Block scope because we change the line terminator variable $/ + my $prevlen = 0; + my $currlen; + + # Buried in the data segment are 3 runs of almost identical + # register-value pairs that end in 0x5d 0x01 which is a "TUNER GO" + # command for the MT352. + # Pull out the middle run (because it's easy) of register-value + # pairs to make the "firmware" file. + + local $/ = "\x5d\x01"; # MT352 "TUNER GO" + + while () { + $currlen = length($_); + if ($prevlen == $currlen || $currlen <= 64) { + chop; chop; # Get rid of "TUNER GO" + s/^\0\0//; # get rid of leading 00 00 if it's there + printf OUT "$_"; + $found = 1; + last; + } + } + } + close OUT; + close IN; + if (!$found) { + unlink $fwfile; + die "Couldn't find valid register-value sequence in $sourcefile for $fwfile\n"; + } + $fwfile; +} + sub cx23885 { my $url = "http://linuxtv.org/downloads/firmware/"; -- cgit v1.2.2 From 02e7804b2135ff941b8846f5820cf48fbfdadd54 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Mon, 29 Jun 2009 11:35:05 -0300 Subject: V4L/DVB (12138): em28xx: add support for Silvercrest Webcam This webcam uses a em2710 chipset, that identifies itself as em2820, plus a mt9v011 sensor, and a DY-301P lens. It needs a few different initializations than a normal em28xx device. Thanks to Hans de Goede and Douglas Landgraf for providing the acces for the webcam during this weekend, I could make a patch for it while returning back from FISL/Fudcom LATAM 2009. Signed-off-by: Mauro Carvalho Chehab --- Documentation/video4linux/CARDLIST.em28xx | 1 + 1 file changed, 1 insertion(+) (limited to 'Documentation') diff --git a/Documentation/video4linux/CARDLIST.em28xx b/Documentation/video4linux/CARDLIST.em28xx index 873630e7e53e..014d255231fc 100644 --- a/Documentation/video4linux/CARDLIST.em28xx +++ b/Documentation/video4linux/CARDLIST.em28xx @@ -66,3 +66,4 @@ 68 -> Terratec AV350 (em2860) [0ccd:0084] 69 -> KWorld ATSC 315U HDTV TV Box (em2882) [eb1a:a313] 70 -> Evga inDtube (em2882) + 71 -> Silvercrest Webcam 1.3mpix (em2820/em2840) -- cgit v1.2.2 From 0a6843483c256c859cd9542361812a29403f0fb5 Mon Sep 17 00:00:00 2001 From: Andy Walls Date: Sun, 5 Jul 2009 16:22:45 -0300 Subject: V4L/DVB (12206): get_dvb_firmware: Correct errors in MPC718 firmware extraction logic The extraction routine for the MPC718 "firmware" had 2 bugs in it, where one bug masked the effect of the other. The loop iteration should have set $prevlen = $currlen at the end of the loop, and the if() check should have used && instead of || for deciding if the firmware length is reasonable. Signed-off-by: Andy Walls Signed-off-by: Mauro Carvalho Chehab --- Documentation/dvb/get_dvb_firmware | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/dvb/get_dvb_firmware b/Documentation/dvb/get_dvb_firmware index 64174d6258f0..3d1b0ab70c8e 100644 --- a/Documentation/dvb/get_dvb_firmware +++ b/Documentation/dvb/get_dvb_firmware @@ -413,13 +413,14 @@ sub mpc718 { while () { $currlen = length($_); - if ($prevlen == $currlen || $currlen <= 64) { + if ($prevlen == $currlen && $currlen <= 64) { chop; chop; # Get rid of "TUNER GO" s/^\0\0//; # get rid of leading 00 00 if it's there printf OUT "$_"; $found = 1; last; } + $prevlen = $currlen; } } close OUT; -- cgit v1.2.2 From 37c90e8887efd218dc4af949b7f498ca2da4af9f Mon Sep 17 00:00:00 2001 From: "venkatesh.pallipadi@intel.com" Date: Thu, 2 Jul 2009 17:08:31 -0700 Subject: [CPUFREQ] Mark policy_rwsem as going static in cpufreq.c wont be exported lock_policy_rwsem_* and unlock_policy_rwsem_* routines in cpufreq.c are currently exported to drivers. Improper use of those locks can result in deadlocks and it is better to keep the locks localized. Two previous in-kernel users of these interfaces (ondemand and conservative), do not use this interfaces any more. Schedule them for removal. Signed-off-by: Venkatesh Pallipadi Signed-off-by: Dave Jones --- Documentation/feature-removal-schedule.txt | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'Documentation') diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index f8cd450be9aa..09e031c55887 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -458,3 +458,13 @@ Why: Remove the old legacy 32bit machine check code. This has been but the old version has been kept around for easier testing. Note this doesn't impact the old P5 and WinChip machine check handlers. Who: Andi Kleen + +---------------------------- + +What: lock_policy_rwsem_* and unlock_policy_rwsem_* will not be + exported interface anymore. +When: 2.6.33 +Why: cpu_policy_rwsem has a new cleaner definition making it local to + cpufreq core and contained inside cpufreq.c. Other dependent + drivers should not use it in order to safely avoid lockdep issues. +Who: Venkatesh Pallipadi -- cgit v1.2.2 From b9744d19e35d74f965fb94bd55f9313d3a7d9e54 Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Tue, 7 Jul 2009 11:10:12 +0200 Subject: mac80211: fix docbook These two functions no longer exist in mac80211, so trying to insert them generates warnings in the document. Signed-off-by: Johannes Berg Signed-off-by: John W. Linville --- Documentation/DocBook/mac80211.tmpl | 2 -- 1 file changed, 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/DocBook/mac80211.tmpl b/Documentation/DocBook/mac80211.tmpl index e36986663570..f3f37f141dbd 100644 --- a/Documentation/DocBook/mac80211.tmpl +++ b/Documentation/DocBook/mac80211.tmpl @@ -184,8 +184,6 @@ usage should require reading the full document. !Finclude/net/mac80211.h ieee80211_ctstoself_get !Finclude/net/mac80211.h ieee80211_ctstoself_duration !Finclude/net/mac80211.h ieee80211_generic_frame_duration -!Finclude/net/mac80211.h ieee80211_get_hdrlen_from_skb -!Finclude/net/mac80211.h ieee80211_hdrlen !Finclude/net/mac80211.h ieee80211_wake_queue !Finclude/net/mac80211.h ieee80211_stop_queue !Finclude/net/mac80211.h ieee80211_wake_queues -- cgit v1.2.2 From 8d7ff4f2a0b22b7d6d7bc3982257d1dadea22824 Mon Sep 17 00:00:00 2001 From: Robert Richter Date: Tue, 23 Jun 2009 11:48:14 +0200 Subject: x86/oprofile: rename kernel parameter for architectural perfmon to arch_perfmon The short name of the achitecture is 'arch_perfmon'. This patch changes the kernel parameter to use this name. Cc: Andi Kleen Signed-off-by: Robert Richter Signed-off-by: Ingo Molnar --- Documentation/kernel-parameters.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 92e1ab8178a8..c59e965a748d 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -1728,8 +1728,8 @@ and is between 256 and 4096 characters. It is defined in the file oprofile.cpu_type= Force an oprofile cpu type This might be useful if you have an older oprofile userland or if you want common events. - Format: { archperfmon } - archperfmon: [X86] Force use of architectural + Format: { arch_perfmon } + arch_perfmon: [X86] Force use of architectural perfmon on Intel CPUs instead of the CPU specific event set. -- cgit v1.2.2 From 3697cd9aa80125f7717c3c7e7253cfa49a39a388 Mon Sep 17 00:00:00 2001 From: Amerigo Wang Date: Fri, 10 Jul 2009 15:02:41 -0700 Subject: Doc: update Documentation/exception.txt Update Documentation/exception.txt. Remove trailing whitespaces in it. Signed-off-by: WANG Cong Signed-off-by: Randy Dunlap Signed-off-by: Linus Torvalds --- Documentation/exception.txt | 202 ++++++++++++++++++++++---------------------- 1 file changed, 101 insertions(+), 101 deletions(-) (limited to 'Documentation') diff --git a/Documentation/exception.txt b/Documentation/exception.txt index 2d5aded64247..32901aa36f0a 100644 --- a/Documentation/exception.txt +++ b/Documentation/exception.txt @@ -1,123 +1,123 @@ - Kernel level exception handling in Linux 2.1.8 + Kernel level exception handling in Linux Commentary by Joerg Pommnitz -When a process runs in kernel mode, it often has to access user -mode memory whose address has been passed by an untrusted program. +When a process runs in kernel mode, it often has to access user +mode memory whose address has been passed by an untrusted program. To protect itself the kernel has to verify this address. -In older versions of Linux this was done with the -int verify_area(int type, const void * addr, unsigned long size) +In older versions of Linux this was done with the +int verify_area(int type, const void * addr, unsigned long size) function (which has since been replaced by access_ok()). -This function verified that the memory area starting at address +This function verified that the memory area starting at address 'addr' and of size 'size' was accessible for the operation specified -in type (read or write). To do this, verify_read had to look up the -virtual memory area (vma) that contained the address addr. In the -normal case (correctly working program), this test was successful. +in type (read or write). To do this, verify_read had to look up the +virtual memory area (vma) that contained the address addr. In the +normal case (correctly working program), this test was successful. It only failed for a few buggy programs. In some kernel profiling tests, this normally unneeded verification used up a considerable amount of time. -To overcome this situation, Linus decided to let the virtual memory +To overcome this situation, Linus decided to let the virtual memory hardware present in every Linux-capable CPU handle this test. How does this work? -Whenever the kernel tries to access an address that is currently not -accessible, the CPU generates a page fault exception and calls the -page fault handler +Whenever the kernel tries to access an address that is currently not +accessible, the CPU generates a page fault exception and calls the +page fault handler void do_page_fault(struct pt_regs *regs, unsigned long error_code) -in arch/i386/mm/fault.c. The parameters on the stack are set up by -the low level assembly glue in arch/i386/kernel/entry.S. The parameter -regs is a pointer to the saved registers on the stack, error_code +in arch/x86/mm/fault.c. The parameters on the stack are set up by +the low level assembly glue in arch/x86/kernel/entry_32.S. The parameter +regs is a pointer to the saved registers on the stack, error_code contains a reason code for the exception. -do_page_fault first obtains the unaccessible address from the CPU -control register CR2. If the address is within the virtual address -space of the process, the fault probably occurred, because the page -was not swapped in, write protected or something similar. However, -we are interested in the other case: the address is not valid, there -is no vma that contains this address. In this case, the kernel jumps -to the bad_area label. - -There it uses the address of the instruction that caused the exception -(i.e. regs->eip) to find an address where the execution can continue -(fixup). If this search is successful, the fault handler modifies the -return address (again regs->eip) and returns. The execution will +do_page_fault first obtains the unaccessible address from the CPU +control register CR2. If the address is within the virtual address +space of the process, the fault probably occurred, because the page +was not swapped in, write protected or something similar. However, +we are interested in the other case: the address is not valid, there +is no vma that contains this address. In this case, the kernel jumps +to the bad_area label. + +There it uses the address of the instruction that caused the exception +(i.e. regs->eip) to find an address where the execution can continue +(fixup). If this search is successful, the fault handler modifies the +return address (again regs->eip) and returns. The execution will continue at the address in fixup. Where does fixup point to? -Since we jump to the contents of fixup, fixup obviously points -to executable code. This code is hidden inside the user access macros. -I have picked the get_user macro defined in include/asm/uaccess.h as an -example. The definition is somewhat hard to follow, so let's peek at +Since we jump to the contents of fixup, fixup obviously points +to executable code. This code is hidden inside the user access macros. +I have picked the get_user macro defined in arch/x86/include/asm/uaccess.h +as an example. The definition is somewhat hard to follow, so let's peek at the code generated by the preprocessor and the compiler. I selected -the get_user call in drivers/char/console.c for a detailed examination. +the get_user call in drivers/char/sysrq.c for a detailed examination. -The original code in console.c line 1405: +The original code in sysrq.c line 587: get_user(c, buf); The preprocessor output (edited to become somewhat readable): ( - { - long __gu_err = - 14 , __gu_val = 0; - const __typeof__(*( ( buf ) )) *__gu_addr = ((buf)); - if (((((0 + current_set[0])->tss.segment) == 0x18 ) || - (((sizeof(*(buf))) <= 0xC0000000UL) && - ((unsigned long)(__gu_addr ) <= 0xC0000000UL - (sizeof(*(buf))))))) + { + long __gu_err = - 14 , __gu_val = 0; + const __typeof__(*( ( buf ) )) *__gu_addr = ((buf)); + if (((((0 + current_set[0])->tss.segment) == 0x18 ) || + (((sizeof(*(buf))) <= 0xC0000000UL) && + ((unsigned long)(__gu_addr ) <= 0xC0000000UL - (sizeof(*(buf))))))) do { - __gu_err = 0; - switch ((sizeof(*(buf)))) { - case 1: - __asm__ __volatile__( - "1: mov" "b" " %2,%" "b" "1\n" - "2:\n" - ".section .fixup,\"ax\"\n" - "3: movl %3,%0\n" - " xor" "b" " %" "b" "1,%" "b" "1\n" - " jmp 2b\n" - ".section __ex_table,\"a\"\n" - " .align 4\n" - " .long 1b,3b\n" + __gu_err = 0; + switch ((sizeof(*(buf)))) { + case 1: + __asm__ __volatile__( + "1: mov" "b" " %2,%" "b" "1\n" + "2:\n" + ".section .fixup,\"ax\"\n" + "3: movl %3,%0\n" + " xor" "b" " %" "b" "1,%" "b" "1\n" + " jmp 2b\n" + ".section __ex_table,\"a\"\n" + " .align 4\n" + " .long 1b,3b\n" ".text" : "=r"(__gu_err), "=q" (__gu_val): "m"((*(struct __large_struct *) - ( __gu_addr )) ), "i"(- 14 ), "0"( __gu_err )) ; - break; - case 2: + ( __gu_addr )) ), "i"(- 14 ), "0"( __gu_err )) ; + break; + case 2: __asm__ __volatile__( - "1: mov" "w" " %2,%" "w" "1\n" - "2:\n" - ".section .fixup,\"ax\"\n" - "3: movl %3,%0\n" - " xor" "w" " %" "w" "1,%" "w" "1\n" - " jmp 2b\n" - ".section __ex_table,\"a\"\n" - " .align 4\n" - " .long 1b,3b\n" + "1: mov" "w" " %2,%" "w" "1\n" + "2:\n" + ".section .fixup,\"ax\"\n" + "3: movl %3,%0\n" + " xor" "w" " %" "w" "1,%" "w" "1\n" + " jmp 2b\n" + ".section __ex_table,\"a\"\n" + " .align 4\n" + " .long 1b,3b\n" ".text" : "=r"(__gu_err), "=r" (__gu_val) : "m"((*(struct __large_struct *) - ( __gu_addr )) ), "i"(- 14 ), "0"( __gu_err )); - break; - case 4: - __asm__ __volatile__( - "1: mov" "l" " %2,%" "" "1\n" - "2:\n" - ".section .fixup,\"ax\"\n" - "3: movl %3,%0\n" - " xor" "l" " %" "" "1,%" "" "1\n" - " jmp 2b\n" - ".section __ex_table,\"a\"\n" - " .align 4\n" " .long 1b,3b\n" + ( __gu_addr )) ), "i"(- 14 ), "0"( __gu_err )); + break; + case 4: + __asm__ __volatile__( + "1: mov" "l" " %2,%" "" "1\n" + "2:\n" + ".section .fixup,\"ax\"\n" + "3: movl %3,%0\n" + " xor" "l" " %" "" "1,%" "" "1\n" + " jmp 2b\n" + ".section __ex_table,\"a\"\n" + " .align 4\n" " .long 1b,3b\n" ".text" : "=r"(__gu_err), "=r" (__gu_val) : "m"((*(struct __large_struct *) - ( __gu_addr )) ), "i"(- 14 ), "0"(__gu_err)); - break; - default: - (__gu_val) = __get_user_bad(); - } - } while (0) ; - ((c)) = (__typeof__(*((buf))))__gu_val; + ( __gu_addr )) ), "i"(- 14 ), "0"(__gu_err)); + break; + default: + (__gu_val) = __get_user_bad(); + } + } while (0) ; + ((c)) = (__typeof__(*((buf))))__gu_val; __gu_err; } ); @@ -127,12 +127,12 @@ see what code gcc generates: > xorl %edx,%edx > movl current_set,%eax - > cmpl $24,788(%eax) - > je .L1424 + > cmpl $24,788(%eax) + > je .L1424 > cmpl $-1073741825,64(%esp) - > ja .L1423 + > ja .L1423 > .L1424: - > movl %edx,%eax + > movl %edx,%eax > movl 64(%esp),%ebx > #APP > 1: movb (%ebx),%dl /* this is the actual user access */ @@ -149,17 +149,17 @@ see what code gcc generates: > .L1423: > movzbl %dl,%esi -The optimizer does a good job and gives us something we can actually -understand. Can we? The actual user access is quite obvious. Thanks -to the unified address space we can just access the address in user +The optimizer does a good job and gives us something we can actually +understand. Can we? The actual user access is quite obvious. Thanks +to the unified address space we can just access the address in user memory. But what does the .section stuff do????? To understand this we have to look at the final kernel: > objdump --section-headers vmlinux - > + > > vmlinux: file format elf32-i386 - > + > > Sections: > Idx Name Size VMA LMA File off Algn > 0 .text 00098f40 c0100000 c0100000 00001000 2**4 @@ -198,18 +198,18 @@ final kernel executable: The whole user memory access is reduced to 10 x86 machine instructions. The instructions bracketed in the .section directives are no longer -in the normal execution path. They are located in a different section +in the normal execution path. They are located in a different section of the executable file: > objdump --disassemble --section=.fixup vmlinux - > + > > c0199ff5 <.fixup+10b5> movl $0xfffffff2,%eax > c0199ffa <.fixup+10ba> xorb %dl,%dl > c0199ffc <.fixup+10bc> jmp c017e7a7 And finally: > objdump --full-contents --section=__ex_table vmlinux - > + > > c01aa7c4 93c017c0 e09f19c0 97c017c0 99c017c0 ................ > c01aa7d4 f6c217c0 e99f19c0 a5e717c0 f59f19c0 ................ > c01aa7e4 080a18c0 01a019c0 0a0a18c0 04a019c0 ................ @@ -235,8 +235,8 @@ sections in the ELF object file. So the instructions ended up in the .fixup section of the object file and the addresses .long 1b,3b ended up in the __ex_table section of the object file. 1b and 3b -are local labels. The local label 1b (1b stands for next label 1 -backward) is the address of the instruction that might fault, i.e. +are local labels. The local label 1b (1b stands for next label 1 +backward) is the address of the instruction that might fault, i.e. in our case the address of the label 1 is c017e7a5: the original assembly code: > 1: movb (%ebx),%dl and linked in vmlinux : > c017e7a5 movb (%ebx),%dl @@ -254,7 +254,7 @@ The assembly code becomes the value pair > c01aa7d4 c017c2f6 c0199fe9 c017e7a5 c0199ff5 ................ ^this is ^this is - 1b 3b + 1b 3b c017e7a5,c0199ff5 in the exception table of the kernel. So, what actually happens if a fault from kernel mode with no suitable @@ -266,9 +266,9 @@ vma occurs? 3.) CPU calls do_page_fault 4.) do page fault calls search_exception_table (regs->eip == c017e7a5); 5.) search_exception_table looks up the address c017e7a5 in the - exception table (i.e. the contents of the ELF section __ex_table) + exception table (i.e. the contents of the ELF section __ex_table) and returns the address of the associated fault handle code c0199ff5. -6.) do_page_fault modifies its own return address to point to the fault +6.) do_page_fault modifies its own return address to point to the fault handle code and returns. 7.) execution continues in the fault handling code. 8.) 8a) EAX becomes -EFAULT (== -14) -- cgit v1.2.2 From c368b4921bc6e309aba2fbee0efcbbc965008d9f Mon Sep 17 00:00:00 2001 From: Amerigo Wang Date: Fri, 10 Jul 2009 15:02:44 -0700 Subject: Doc: move Documentation/exception.txt into x86 subdir exception.txt only explains the code on x86, so it's better to move it into Documentation/x86 directory. And also rename it to exception-tables.txt which looks much more reasonable. This patch is on top of the previous one. Signed-off-by: WANG Cong Signed-off-by: Randy Dunlap Signed-off-by: Linus Torvalds --- Documentation/exception.txt | 292 --------------------------------- Documentation/x86/00-INDEX | 2 + Documentation/x86/exception-tables.txt | 292 +++++++++++++++++++++++++++++++++ 3 files changed, 294 insertions(+), 292 deletions(-) delete mode 100644 Documentation/exception.txt create mode 100644 Documentation/x86/exception-tables.txt (limited to 'Documentation') diff --git a/Documentation/exception.txt b/Documentation/exception.txt deleted file mode 100644 index 32901aa36f0a..000000000000 --- a/Documentation/exception.txt +++ /dev/null @@ -1,292 +0,0 @@ - Kernel level exception handling in Linux - Commentary by Joerg Pommnitz - -When a process runs in kernel mode, it often has to access user -mode memory whose address has been passed by an untrusted program. -To protect itself the kernel has to verify this address. - -In older versions of Linux this was done with the -int verify_area(int type, const void * addr, unsigned long size) -function (which has since been replaced by access_ok()). - -This function verified that the memory area starting at address -'addr' and of size 'size' was accessible for the operation specified -in type (read or write). To do this, verify_read had to look up the -virtual memory area (vma) that contained the address addr. In the -normal case (correctly working program), this test was successful. -It only failed for a few buggy programs. In some kernel profiling -tests, this normally unneeded verification used up a considerable -amount of time. - -To overcome this situation, Linus decided to let the virtual memory -hardware present in every Linux-capable CPU handle this test. - -How does this work? - -Whenever the kernel tries to access an address that is currently not -accessible, the CPU generates a page fault exception and calls the -page fault handler - -void do_page_fault(struct pt_regs *regs, unsigned long error_code) - -in arch/x86/mm/fault.c. The parameters on the stack are set up by -the low level assembly glue in arch/x86/kernel/entry_32.S. The parameter -regs is a pointer to the saved registers on the stack, error_code -contains a reason code for the exception. - -do_page_fault first obtains the unaccessible address from the CPU -control register CR2. If the address is within the virtual address -space of the process, the fault probably occurred, because the page -was not swapped in, write protected or something similar. However, -we are interested in the other case: the address is not valid, there -is no vma that contains this address. In this case, the kernel jumps -to the bad_area label. - -There it uses the address of the instruction that caused the exception -(i.e. regs->eip) to find an address where the execution can continue -(fixup). If this search is successful, the fault handler modifies the -return address (again regs->eip) and returns. The execution will -continue at the address in fixup. - -Where does fixup point to? - -Since we jump to the contents of fixup, fixup obviously points -to executable code. This code is hidden inside the user access macros. -I have picked the get_user macro defined in arch/x86/include/asm/uaccess.h -as an example. The definition is somewhat hard to follow, so let's peek at -the code generated by the preprocessor and the compiler. I selected -the get_user call in drivers/char/sysrq.c for a detailed examination. - -The original code in sysrq.c line 587: - get_user(c, buf); - -The preprocessor output (edited to become somewhat readable): - -( - { - long __gu_err = - 14 , __gu_val = 0; - const __typeof__(*( ( buf ) )) *__gu_addr = ((buf)); - if (((((0 + current_set[0])->tss.segment) == 0x18 ) || - (((sizeof(*(buf))) <= 0xC0000000UL) && - ((unsigned long)(__gu_addr ) <= 0xC0000000UL - (sizeof(*(buf))))))) - do { - __gu_err = 0; - switch ((sizeof(*(buf)))) { - case 1: - __asm__ __volatile__( - "1: mov" "b" " %2,%" "b" "1\n" - "2:\n" - ".section .fixup,\"ax\"\n" - "3: movl %3,%0\n" - " xor" "b" " %" "b" "1,%" "b" "1\n" - " jmp 2b\n" - ".section __ex_table,\"a\"\n" - " .align 4\n" - " .long 1b,3b\n" - ".text" : "=r"(__gu_err), "=q" (__gu_val): "m"((*(struct __large_struct *) - ( __gu_addr )) ), "i"(- 14 ), "0"( __gu_err )) ; - break; - case 2: - __asm__ __volatile__( - "1: mov" "w" " %2,%" "w" "1\n" - "2:\n" - ".section .fixup,\"ax\"\n" - "3: movl %3,%0\n" - " xor" "w" " %" "w" "1,%" "w" "1\n" - " jmp 2b\n" - ".section __ex_table,\"a\"\n" - " .align 4\n" - " .long 1b,3b\n" - ".text" : "=r"(__gu_err), "=r" (__gu_val) : "m"((*(struct __large_struct *) - ( __gu_addr )) ), "i"(- 14 ), "0"( __gu_err )); - break; - case 4: - __asm__ __volatile__( - "1: mov" "l" " %2,%" "" "1\n" - "2:\n" - ".section .fixup,\"ax\"\n" - "3: movl %3,%0\n" - " xor" "l" " %" "" "1,%" "" "1\n" - " jmp 2b\n" - ".section __ex_table,\"a\"\n" - " .align 4\n" " .long 1b,3b\n" - ".text" : "=r"(__gu_err), "=r" (__gu_val) : "m"((*(struct __large_struct *) - ( __gu_addr )) ), "i"(- 14 ), "0"(__gu_err)); - break; - default: - (__gu_val) = __get_user_bad(); - } - } while (0) ; - ((c)) = (__typeof__(*((buf))))__gu_val; - __gu_err; - } -); - -WOW! Black GCC/assembly magic. This is impossible to follow, so let's -see what code gcc generates: - - > xorl %edx,%edx - > movl current_set,%eax - > cmpl $24,788(%eax) - > je .L1424 - > cmpl $-1073741825,64(%esp) - > ja .L1423 - > .L1424: - > movl %edx,%eax - > movl 64(%esp),%ebx - > #APP - > 1: movb (%ebx),%dl /* this is the actual user access */ - > 2: - > .section .fixup,"ax" - > 3: movl $-14,%eax - > xorb %dl,%dl - > jmp 2b - > .section __ex_table,"a" - > .align 4 - > .long 1b,3b - > .text - > #NO_APP - > .L1423: - > movzbl %dl,%esi - -The optimizer does a good job and gives us something we can actually -understand. Can we? The actual user access is quite obvious. Thanks -to the unified address space we can just access the address in user -memory. But what does the .section stuff do????? - -To understand this we have to look at the final kernel: - - > objdump --section-headers vmlinux - > - > vmlinux: file format elf32-i386 - > - > Sections: - > Idx Name Size VMA LMA File off Algn - > 0 .text 00098f40 c0100000 c0100000 00001000 2**4 - > CONTENTS, ALLOC, LOAD, READONLY, CODE - > 1 .fixup 000016bc c0198f40 c0198f40 00099f40 2**0 - > CONTENTS, ALLOC, LOAD, READONLY, CODE - > 2 .rodata 0000f127 c019a5fc c019a5fc 0009b5fc 2**2 - > CONTENTS, ALLOC, LOAD, READONLY, DATA - > 3 __ex_table 000015c0 c01a9724 c01a9724 000aa724 2**2 - > CONTENTS, ALLOC, LOAD, READONLY, DATA - > 4 .data 0000ea58 c01abcf0 c01abcf0 000abcf0 2**4 - > CONTENTS, ALLOC, LOAD, DATA - > 5 .bss 00018e21 c01ba748 c01ba748 000ba748 2**2 - > ALLOC - > 6 .comment 00000ec4 00000000 00000000 000ba748 2**0 - > CONTENTS, READONLY - > 7 .note 00001068 00000ec4 00000ec4 000bb60c 2**0 - > CONTENTS, READONLY - -There are obviously 2 non standard ELF sections in the generated object -file. But first we want to find out what happened to our code in the -final kernel executable: - - > objdump --disassemble --section=.text vmlinux - > - > c017e785 xorl %edx,%edx - > c017e787 movl 0xc01c7bec,%eax - > c017e78c cmpl $0x18,0x314(%eax) - > c017e793 je c017e79f - > c017e795 cmpl $0xbfffffff,0x40(%esp,1) - > c017e79d ja c017e7a7 - > c017e79f movl %edx,%eax - > c017e7a1 movl 0x40(%esp,1),%ebx - > c017e7a5 movb (%ebx),%dl - > c017e7a7 movzbl %dl,%esi - -The whole user memory access is reduced to 10 x86 machine instructions. -The instructions bracketed in the .section directives are no longer -in the normal execution path. They are located in a different section -of the executable file: - - > objdump --disassemble --section=.fixup vmlinux - > - > c0199ff5 <.fixup+10b5> movl $0xfffffff2,%eax - > c0199ffa <.fixup+10ba> xorb %dl,%dl - > c0199ffc <.fixup+10bc> jmp c017e7a7 - -And finally: - > objdump --full-contents --section=__ex_table vmlinux - > - > c01aa7c4 93c017c0 e09f19c0 97c017c0 99c017c0 ................ - > c01aa7d4 f6c217c0 e99f19c0 a5e717c0 f59f19c0 ................ - > c01aa7e4 080a18c0 01a019c0 0a0a18c0 04a019c0 ................ - -or in human readable byte order: - - > c01aa7c4 c017c093 c0199fe0 c017c097 c017c099 ................ - > c01aa7d4 c017c2f6 c0199fe9 c017e7a5 c0199ff5 ................ - ^^^^^^^^^^^^^^^^^ - this is the interesting part! - > c01aa7e4 c0180a08 c019a001 c0180a0a c019a004 ................ - -What happened? The assembly directives - -.section .fixup,"ax" -.section __ex_table,"a" - -told the assembler to move the following code to the specified -sections in the ELF object file. So the instructions -3: movl $-14,%eax - xorb %dl,%dl - jmp 2b -ended up in the .fixup section of the object file and the addresses - .long 1b,3b -ended up in the __ex_table section of the object file. 1b and 3b -are local labels. The local label 1b (1b stands for next label 1 -backward) is the address of the instruction that might fault, i.e. -in our case the address of the label 1 is c017e7a5: -the original assembly code: > 1: movb (%ebx),%dl -and linked in vmlinux : > c017e7a5 movb (%ebx),%dl - -The local label 3 (backwards again) is the address of the code to handle -the fault, in our case the actual value is c0199ff5: -the original assembly code: > 3: movl $-14,%eax -and linked in vmlinux : > c0199ff5 <.fixup+10b5> movl $0xfffffff2,%eax - -The assembly code - > .section __ex_table,"a" - > .align 4 - > .long 1b,3b - -becomes the value pair - > c01aa7d4 c017c2f6 c0199fe9 c017e7a5 c0199ff5 ................ - ^this is ^this is - 1b 3b -c017e7a5,c0199ff5 in the exception table of the kernel. - -So, what actually happens if a fault from kernel mode with no suitable -vma occurs? - -1.) access to invalid address: - > c017e7a5 movb (%ebx),%dl -2.) MMU generates exception -3.) CPU calls do_page_fault -4.) do page fault calls search_exception_table (regs->eip == c017e7a5); -5.) search_exception_table looks up the address c017e7a5 in the - exception table (i.e. the contents of the ELF section __ex_table) - and returns the address of the associated fault handle code c0199ff5. -6.) do_page_fault modifies its own return address to point to the fault - handle code and returns. -7.) execution continues in the fault handling code. -8.) 8a) EAX becomes -EFAULT (== -14) - 8b) DL becomes zero (the value we "read" from user space) - 8c) execution continues at local label 2 (address of the - instruction immediately after the faulting user access). - -The steps 8a to 8c in a certain way emulate the faulting instruction. - -That's it, mostly. If you look at our example, you might ask why -we set EAX to -EFAULT in the exception handler code. Well, the -get_user macro actually returns a value: 0, if the user access was -successful, -EFAULT on failure. Our original code did not test this -return value, however the inline assembly code in get_user tries to -return -EFAULT. GCC selected EAX to return this value. - -NOTE: -Due to the way that the exception table is built and needs to be ordered, -only use exceptions for code in the .text section. Any other section -will cause the exception table to not be sorted correctly, and the -exceptions will fail. diff --git a/Documentation/x86/00-INDEX b/Documentation/x86/00-INDEX index dbe3377754af..f37b46d34861 100644 --- a/Documentation/x86/00-INDEX +++ b/Documentation/x86/00-INDEX @@ -2,3 +2,5 @@ - this file mtrr.txt - how to use x86 Memory Type Range Registers to increase performance +exception-tables.txt + - why and how Linux kernel uses exception tables on x86 diff --git a/Documentation/x86/exception-tables.txt b/Documentation/x86/exception-tables.txt new file mode 100644 index 000000000000..32901aa36f0a --- /dev/null +++ b/Documentation/x86/exception-tables.txt @@ -0,0 +1,292 @@ + Kernel level exception handling in Linux + Commentary by Joerg Pommnitz + +When a process runs in kernel mode, it often has to access user +mode memory whose address has been passed by an untrusted program. +To protect itself the kernel has to verify this address. + +In older versions of Linux this was done with the +int verify_area(int type, const void * addr, unsigned long size) +function (which has since been replaced by access_ok()). + +This function verified that the memory area starting at address +'addr' and of size 'size' was accessible for the operation specified +in type (read or write). To do this, verify_read had to look up the +virtual memory area (vma) that contained the address addr. In the +normal case (correctly working program), this test was successful. +It only failed for a few buggy programs. In some kernel profiling +tests, this normally unneeded verification used up a considerable +amount of time. + +To overcome this situation, Linus decided to let the virtual memory +hardware present in every Linux-capable CPU handle this test. + +How does this work? + +Whenever the kernel tries to access an address that is currently not +accessible, the CPU generates a page fault exception and calls the +page fault handler + +void do_page_fault(struct pt_regs *regs, unsigned long error_code) + +in arch/x86/mm/fault.c. The parameters on the stack are set up by +the low level assembly glue in arch/x86/kernel/entry_32.S. The parameter +regs is a pointer to the saved registers on the stack, error_code +contains a reason code for the exception. + +do_page_fault first obtains the unaccessible address from the CPU +control register CR2. If the address is within the virtual address +space of the process, the fault probably occurred, because the page +was not swapped in, write protected or something similar. However, +we are interested in the other case: the address is not valid, there +is no vma that contains this address. In this case, the kernel jumps +to the bad_area label. + +There it uses the address of the instruction that caused the exception +(i.e. regs->eip) to find an address where the execution can continue +(fixup). If this search is successful, the fault handler modifies the +return address (again regs->eip) and returns. The execution will +continue at the address in fixup. + +Where does fixup point to? + +Since we jump to the contents of fixup, fixup obviously points +to executable code. This code is hidden inside the user access macros. +I have picked the get_user macro defined in arch/x86/include/asm/uaccess.h +as an example. The definition is somewhat hard to follow, so let's peek at +the code generated by the preprocessor and the compiler. I selected +the get_user call in drivers/char/sysrq.c for a detailed examination. + +The original code in sysrq.c line 587: + get_user(c, buf); + +The preprocessor output (edited to become somewhat readable): + +( + { + long __gu_err = - 14 , __gu_val = 0; + const __typeof__(*( ( buf ) )) *__gu_addr = ((buf)); + if (((((0 + current_set[0])->tss.segment) == 0x18 ) || + (((sizeof(*(buf))) <= 0xC0000000UL) && + ((unsigned long)(__gu_addr ) <= 0xC0000000UL - (sizeof(*(buf))))))) + do { + __gu_err = 0; + switch ((sizeof(*(buf)))) { + case 1: + __asm__ __volatile__( + "1: mov" "b" " %2,%" "b" "1\n" + "2:\n" + ".section .fixup,\"ax\"\n" + "3: movl %3,%0\n" + " xor" "b" " %" "b" "1,%" "b" "1\n" + " jmp 2b\n" + ".section __ex_table,\"a\"\n" + " .align 4\n" + " .long 1b,3b\n" + ".text" : "=r"(__gu_err), "=q" (__gu_val): "m"((*(struct __large_struct *) + ( __gu_addr )) ), "i"(- 14 ), "0"( __gu_err )) ; + break; + case 2: + __asm__ __volatile__( + "1: mov" "w" " %2,%" "w" "1\n" + "2:\n" + ".section .fixup,\"ax\"\n" + "3: movl %3,%0\n" + " xor" "w" " %" "w" "1,%" "w" "1\n" + " jmp 2b\n" + ".section __ex_table,\"a\"\n" + " .align 4\n" + " .long 1b,3b\n" + ".text" : "=r"(__gu_err), "=r" (__gu_val) : "m"((*(struct __large_struct *) + ( __gu_addr )) ), "i"(- 14 ), "0"( __gu_err )); + break; + case 4: + __asm__ __volatile__( + "1: mov" "l" " %2,%" "" "1\n" + "2:\n" + ".section .fixup,\"ax\"\n" + "3: movl %3,%0\n" + " xor" "l" " %" "" "1,%" "" "1\n" + " jmp 2b\n" + ".section __ex_table,\"a\"\n" + " .align 4\n" " .long 1b,3b\n" + ".text" : "=r"(__gu_err), "=r" (__gu_val) : "m"((*(struct __large_struct *) + ( __gu_addr )) ), "i"(- 14 ), "0"(__gu_err)); + break; + default: + (__gu_val) = __get_user_bad(); + } + } while (0) ; + ((c)) = (__typeof__(*((buf))))__gu_val; + __gu_err; + } +); + +WOW! Black GCC/assembly magic. This is impossible to follow, so let's +see what code gcc generates: + + > xorl %edx,%edx + > movl current_set,%eax + > cmpl $24,788(%eax) + > je .L1424 + > cmpl $-1073741825,64(%esp) + > ja .L1423 + > .L1424: + > movl %edx,%eax + > movl 64(%esp),%ebx + > #APP + > 1: movb (%ebx),%dl /* this is the actual user access */ + > 2: + > .section .fixup,"ax" + > 3: movl $-14,%eax + > xorb %dl,%dl + > jmp 2b + > .section __ex_table,"a" + > .align 4 + > .long 1b,3b + > .text + > #NO_APP + > .L1423: + > movzbl %dl,%esi + +The optimizer does a good job and gives us something we can actually +understand. Can we? The actual user access is quite obvious. Thanks +to the unified address space we can just access the address in user +memory. But what does the .section stuff do????? + +To understand this we have to look at the final kernel: + + > objdump --section-headers vmlinux + > + > vmlinux: file format elf32-i386 + > + > Sections: + > Idx Name Size VMA LMA File off Algn + > 0 .text 00098f40 c0100000 c0100000 00001000 2**4 + > CONTENTS, ALLOC, LOAD, READONLY, CODE + > 1 .fixup 000016bc c0198f40 c0198f40 00099f40 2**0 + > CONTENTS, ALLOC, LOAD, READONLY, CODE + > 2 .rodata 0000f127 c019a5fc c019a5fc 0009b5fc 2**2 + > CONTENTS, ALLOC, LOAD, READONLY, DATA + > 3 __ex_table 000015c0 c01a9724 c01a9724 000aa724 2**2 + > CONTENTS, ALLOC, LOAD, READONLY, DATA + > 4 .data 0000ea58 c01abcf0 c01abcf0 000abcf0 2**4 + > CONTENTS, ALLOC, LOAD, DATA + > 5 .bss 00018e21 c01ba748 c01ba748 000ba748 2**2 + > ALLOC + > 6 .comment 00000ec4 00000000 00000000 000ba748 2**0 + > CONTENTS, READONLY + > 7 .note 00001068 00000ec4 00000ec4 000bb60c 2**0 + > CONTENTS, READONLY + +There are obviously 2 non standard ELF sections in the generated object +file. But first we want to find out what happened to our code in the +final kernel executable: + + > objdump --disassemble --section=.text vmlinux + > + > c017e785 xorl %edx,%edx + > c017e787 movl 0xc01c7bec,%eax + > c017e78c cmpl $0x18,0x314(%eax) + > c017e793 je c017e79f + > c017e795 cmpl $0xbfffffff,0x40(%esp,1) + > c017e79d ja c017e7a7 + > c017e79f movl %edx,%eax + > c017e7a1 movl 0x40(%esp,1),%ebx + > c017e7a5 movb (%ebx),%dl + > c017e7a7 movzbl %dl,%esi + +The whole user memory access is reduced to 10 x86 machine instructions. +The instructions bracketed in the .section directives are no longer +in the normal execution path. They are located in a different section +of the executable file: + + > objdump --disassemble --section=.fixup vmlinux + > + > c0199ff5 <.fixup+10b5> movl $0xfffffff2,%eax + > c0199ffa <.fixup+10ba> xorb %dl,%dl + > c0199ffc <.fixup+10bc> jmp c017e7a7 + +And finally: + > objdump --full-contents --section=__ex_table vmlinux + > + > c01aa7c4 93c017c0 e09f19c0 97c017c0 99c017c0 ................ + > c01aa7d4 f6c217c0 e99f19c0 a5e717c0 f59f19c0 ................ + > c01aa7e4 080a18c0 01a019c0 0a0a18c0 04a019c0 ................ + +or in human readable byte order: + + > c01aa7c4 c017c093 c0199fe0 c017c097 c017c099 ................ + > c01aa7d4 c017c2f6 c0199fe9 c017e7a5 c0199ff5 ................ + ^^^^^^^^^^^^^^^^^ + this is the interesting part! + > c01aa7e4 c0180a08 c019a001 c0180a0a c019a004 ................ + +What happened? The assembly directives + +.section .fixup,"ax" +.section __ex_table,"a" + +told the assembler to move the following code to the specified +sections in the ELF object file. So the instructions +3: movl $-14,%eax + xorb %dl,%dl + jmp 2b +ended up in the .fixup section of the object file and the addresses + .long 1b,3b +ended up in the __ex_table section of the object file. 1b and 3b +are local labels. The local label 1b (1b stands for next label 1 +backward) is the address of the instruction that might fault, i.e. +in our case the address of the label 1 is c017e7a5: +the original assembly code: > 1: movb (%ebx),%dl +and linked in vmlinux : > c017e7a5 movb (%ebx),%dl + +The local label 3 (backwards again) is the address of the code to handle +the fault, in our case the actual value is c0199ff5: +the original assembly code: > 3: movl $-14,%eax +and linked in vmlinux : > c0199ff5 <.fixup+10b5> movl $0xfffffff2,%eax + +The assembly code + > .section __ex_table,"a" + > .align 4 + > .long 1b,3b + +becomes the value pair + > c01aa7d4 c017c2f6 c0199fe9 c017e7a5 c0199ff5 ................ + ^this is ^this is + 1b 3b +c017e7a5,c0199ff5 in the exception table of the kernel. + +So, what actually happens if a fault from kernel mode with no suitable +vma occurs? + +1.) access to invalid address: + > c017e7a5 movb (%ebx),%dl +2.) MMU generates exception +3.) CPU calls do_page_fault +4.) do page fault calls search_exception_table (regs->eip == c017e7a5); +5.) search_exception_table looks up the address c017e7a5 in the + exception table (i.e. the contents of the ELF section __ex_table) + and returns the address of the associated fault handle code c0199ff5. +6.) do_page_fault modifies its own return address to point to the fault + handle code and returns. +7.) execution continues in the fault handling code. +8.) 8a) EAX becomes -EFAULT (== -14) + 8b) DL becomes zero (the value we "read" from user space) + 8c) execution continues at local label 2 (address of the + instruction immediately after the faulting user access). + +The steps 8a to 8c in a certain way emulate the faulting instruction. + +That's it, mostly. If you look at our example, you might ask why +we set EAX to -EFAULT in the exception handler code. Well, the +get_user macro actually returns a value: 0, if the user access was +successful, -EFAULT on failure. Our original code did not test this +return value, however the inline assembly code in get_user tries to +return -EFAULT. GCC selected EAX to return this value. + +NOTE: +Due to the way that the exception table is built and needs to be ordered, +only use exceptions for code in the .text section. Any other section +will cause the exception table to not be sorted correctly, and the +exceptions will fail. -- cgit v1.2.2 From 909662e1e7290945fa3bca038bc3b7bb5d19499f Mon Sep 17 00:00:00 2001 From: vibi sreenivasan Date: Wed, 8 Jul 2009 15:37:03 -0700 Subject: driver model: fix show/store prototypes in doc. FIX prototypes for show & store method in struct driver_attribute Signed-off-by: vibi sreenivasan Signed-off-by: Randy Dunlap Signed-off-by: Greg Kroah-Hartman --- Documentation/driver-model/driver.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/driver-model/driver.txt b/Documentation/driver-model/driver.txt index 82132169d47a..60120fb3b961 100644 --- a/Documentation/driver-model/driver.txt +++ b/Documentation/driver-model/driver.txt @@ -207,8 +207,8 @@ Attributes ~~~~~~~~~~ struct driver_attribute { struct attribute attr; - ssize_t (*show)(struct device_driver *, char * buf, size_t count, loff_t off); - ssize_t (*store)(struct device_driver *, const char * buf, size_t count, loff_t off); + ssize_t (*show)(struct device_driver *driver, char *buf); + ssize_t (*store)(struct device_driver *, const char * buf, size_t count); }; Device drivers can export attributes via their sysfs directories. -- cgit v1.2.2 From 941297f443f871b8c3372feccf27a8733f6ce9e9 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 16 Jul 2009 14:03:40 +0200 Subject: netfilter: nf_conntrack: nf_conntrack_alloc() fixes When a slab cache uses SLAB_DESTROY_BY_RCU, we must be careful when allocating objects, since slab allocator could give a freed object still used by lockless readers. In particular, nf_conntrack RCU lookups rely on ct->tuplehash[xxx].hnnode.next being always valid (ie containing a valid 'nulls' value, or a valid pointer to next object in hash chain.) kmem_cache_zalloc() setups object with NULL values, but a NULL value is not valid for ct->tuplehash[xxx].hnnode.next. Fix is to call kmem_cache_alloc() and do the zeroing ourself. As spotted by Patrick, we also need to make sure lookup keys are committed to memory before setting refcount to 1, or a lockless reader could get a reference on the old version of the object. Its key re-check could then pass the barrier. Signed-off-by: Eric Dumazet Signed-off-by: Patrick McHardy --- Documentation/RCU/rculist_nulls.txt | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/RCU/rculist_nulls.txt b/Documentation/RCU/rculist_nulls.txt index 93cb28d05dcd..18f9651ff23d 100644 --- a/Documentation/RCU/rculist_nulls.txt +++ b/Documentation/RCU/rculist_nulls.txt @@ -83,11 +83,12 @@ not detect it missed following items in original chain. obj = kmem_cache_alloc(...); lock_chain(); // typically a spin_lock() obj->key = key; -atomic_inc(&obj->refcnt); /* * we need to make sure obj->key is updated before obj->next + * or obj->refcnt */ smp_wmb(); +atomic_set(&obj->refcnt, 1); hlist_add_head_rcu(&obj->obj_node, list); unlock_chain(); // typically a spin_unlock() @@ -159,6 +160,10 @@ out: obj = kmem_cache_alloc(cachep); lock_chain(); // typically a spin_lock() obj->key = key; +/* + * changes to obj->key must be visible before refcnt one + */ +smp_wmb(); atomic_set(&obj->refcnt, 1); /* * insert obj in RCU way (readers might be traversing chain) -- cgit v1.2.2 From 673325951ef440ebace311bd542a9378d1b3025b Mon Sep 17 00:00:00 2001 From: Ralf Baechle Date: Fri, 17 Jul 2009 04:47:19 +0000 Subject: Update Andreas Koensgen's email address The kernel has used a stale email address of Andreas for a few years. Signed-off-by: Ralf Baechle Signed-off-by: David S. Miller --- Documentation/networking/6pack.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/networking/6pack.txt b/Documentation/networking/6pack.txt index d0777a1200e1..8f339428fdf4 100644 --- a/Documentation/networking/6pack.txt +++ b/Documentation/networking/6pack.txt @@ -1,7 +1,7 @@ This is the 6pack-mini-HOWTO, written by Andreas Könsgen DG3KQ -Internet: ajk@iehk.rwth-aachen.de +Internet: ajk@comnets.uni-bremen.de AMPR-net: dg3kq@db0pra.ampr.org AX.25: dg3kq@db0ach.#nrw.deu.eu -- cgit v1.2.2 From acb9c1b2f406d25c381de2b429f65706cc04d3b5 Mon Sep 17 00:00:00 2001 From: Evgeniy Polyakov Date: Tue, 21 Jul 2009 12:43:51 -0700 Subject: connector: maintainer/mail update. Signed-off-by: Evgeniy Polyakov Signed-off-by: David S. Miller --- Documentation/connector/cn_test.c | 4 ++-- Documentation/connector/ucon.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) (limited to 'Documentation') diff --git a/Documentation/connector/cn_test.c b/Documentation/connector/cn_test.c index f688eba87704..6a5be5d5c8e4 100644 --- a/Documentation/connector/cn_test.c +++ b/Documentation/connector/cn_test.c @@ -1,7 +1,7 @@ /* * cn_test.c * - * 2004-2005 Copyright (c) Evgeniy Polyakov + * 2004+ Copyright (c) Evgeniy Polyakov * All rights reserved. * * This program is free software; you can redistribute it and/or modify @@ -194,5 +194,5 @@ module_init(cn_test_init); module_exit(cn_test_fini); MODULE_LICENSE("GPL"); -MODULE_AUTHOR("Evgeniy Polyakov "); +MODULE_AUTHOR("Evgeniy Polyakov "); MODULE_DESCRIPTION("Connector's test module"); diff --git a/Documentation/connector/ucon.c b/Documentation/connector/ucon.c index d738cde2a8d5..c5092ad0ce4b 100644 --- a/Documentation/connector/ucon.c +++ b/Documentation/connector/ucon.c @@ -1,7 +1,7 @@ /* * ucon.c * - * Copyright (c) 2004+ Evgeniy Polyakov + * Copyright (c) 2004+ Evgeniy Polyakov * * * This program is free software; you can redistribute it and/or modify -- cgit v1.2.2 From cedb8118e8cef21a2b73fd9cb70660ac19124c16 Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Thu, 23 Jul 2009 11:04:13 +0200 Subject: ALSA: pcm - Add logging of hwptr updates and interrupt updates Added the logging functionality to xrun_debug to record the hwptr updates via snd_pcm_update_hw_ptr() and snd_pcm_update_hwptr_interrupt(), corresponding to 16 and 8, respectively. For example, # echo 9 > /proc/asound/card0/pcm0p/xrun_debug will record the position and other parameters at each period interrupt together with the normal XRUN debugging. Signed-off-by: Takashi Iwai --- Documentation/sound/alsa/Procfile.txt | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'Documentation') diff --git a/Documentation/sound/alsa/Procfile.txt b/Documentation/sound/alsa/Procfile.txt index 381908d8ca42..719a819f8cc2 100644 --- a/Documentation/sound/alsa/Procfile.txt +++ b/Documentation/sound/alsa/Procfile.txt @@ -101,6 +101,8 @@ card*/pcm*/xrun_debug bit 0 = Enable XRUN/jiffies debug messages bit 1 = Show stack trace at XRUN / jiffies check bit 2 = Enable additional jiffies check + bit 3 = Log hwptr update at each period interrupt + bit 4 = Log hwptr update at each snd_pcm_update_hw_ptr() When the bit 0 is set, the driver will show the messages to kernel log when an xrun is detected. The debug message is @@ -117,6 +119,9 @@ card*/pcm*/xrun_debug buggy) hardware that doesn't give smooth pointer updates. This feature is enabled via the bit 2. + Bits 3 and 4 are for logging the hwptr records. Note that + these will give flood of kernel messages. + card*/pcm*/sub*/info The general information of this PCM sub-stream. -- cgit v1.2.2 From 8b220793d6fd309176438721088515be893630cd Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Sun, 12 Jul 2009 10:56:21 -0300 Subject: V4L/DVB (12235): em28xx: detects sensors also with the generic em2750/2750 entry Webcams in general don't have eeprom. So, the sensor hint code should be called to properly detect what sensor is inside. Signed-off-by: Mauro Carvalho Chehab --- Documentation/video4linux/CARDLIST.em28xx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/video4linux/CARDLIST.em28xx b/Documentation/video4linux/CARDLIST.em28xx index 014d255231fc..68c236c01846 100644 --- a/Documentation/video4linux/CARDLIST.em28xx +++ b/Documentation/video4linux/CARDLIST.em28xx @@ -20,7 +20,7 @@ 19 -> EM2860/SAA711X Reference Design (em2860) 20 -> AMD ATI TV Wonder HD 600 (em2880) [0438:b002] 21 -> eMPIA Technology, Inc. GrabBeeX+ Video Encoder (em2800) [eb1a:2801] - 22 -> Unknown EM2750/EM2751 webcam grabber (em2750) [eb1a:2750,eb1a:2751] + 22 -> EM2710/EM2750/EM2751 webcam grabber (em2750) [eb1a:2750,eb1a:2751] 23 -> Huaqi DLCW-130 (em2750) 24 -> D-Link DUB-T210 TV Tuner (em2820/em2840) [2001:f112] 25 -> Gadmei UTV310 (em2820/em2840) -- cgit v1.2.2 From 26e744b6b61066203fd57de0d3962353621e06f8 Mon Sep 17 00:00:00 2001 From: Brian Johnson Date: Sun, 19 Jul 2009 05:52:58 -0300 Subject: V4L/DVB (12283): gspca - sn9c20x: New subdriver for sn9c201 and sn9c202 bridges. Signed-off-by: Brian Johnson Signed-off-by: Jean-Francois Moine Signed-off-by: Mauro Carvalho Chehab --- Documentation/video4linux/gspca.txt | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+) (limited to 'Documentation') diff --git a/Documentation/video4linux/gspca.txt b/Documentation/video4linux/gspca.txt index 2bcf78896e22..573f95b58807 100644 --- a/Documentation/video4linux/gspca.txt +++ b/Documentation/video4linux/gspca.txt @@ -44,7 +44,9 @@ zc3xx 0458:7007 Genius VideoCam V2 zc3xx 0458:700c Genius VideoCam V3 zc3xx 0458:700f Genius VideoCam Web V2 sonixj 0458:7025 Genius Eye 311Q +sn9c20x 0458:7029 Genius Look 320s sonixj 0458:702e Genius Slim 310 NB +sn9c20x 045e:00f4 LifeCam VX-6000 (SN9C20x + OV9650) sonixj 045e:00f5 MicroSoft VX3000 sonixj 045e:00f7 MicroSoft VX1000 ov519 045e:028c Micro$oft xbox cam @@ -282,6 +284,28 @@ sonixj 0c45:613a Microdia Sonix PC Camera sonixj 0c45:613b Surfer SN-206 sonixj 0c45:613c Sonix Pccam168 sonixj 0c45:6143 Sonix Pccam168 +sn9c20x 0c45:6240 PC Camera (SN9C201 + MT9M001) +sn9c20x 0c45:6242 PC Camera (SN9C201 + MT9M111) +sn9c20x 0c45:6248 PC Camera (SN9C201 + OV9655) +sn9c20x 0c45:624e PC Camera (SN9C201 + SOI968) +sn9c20x 0c45:624f PC Camera (SN9C201 + OV9650) +sn9c20x 0c45:6251 PC Camera (SN9C201 + OV9650) +sn9c20x 0c45:6253 PC Camera (SN9C201 + OV9650) +sn9c20x 0c45:6260 PC Camera (SN9C201 + OV7670) +sn9c20x 0c45:6270 PC Camera (SN9C201 + MT9V011/MT9V111/MT9V112) +sn9c20x 0c45:627b PC Camera (SN9C201 + OV7660) +sn9c20x 0c45:627c PC Camera (SN9C201 + HV7131R) +sn9c20x 0c45:627f PC Camera (SN9C201 + OV9650) +sn9c20x 0c45:6280 PC Camera (SN9C202 + MT9M001) +sn9c20x 0c45:6282 PC Camera (SN9C202 + MT9M111) +sn9c20x 0c45:6288 PC Camera (SN9C202 + OV9655) +sn9c20x 0c45:628e PC Camera (SN9C202 + SOI968) +sn9c20x 0c45:628f PC Camera (SN9C202 + OV9650) +sn9c20x 0c45:62a0 PC Camera (SN9C202 + OV7670) +sn9c20x 0c45:62b0 PC Camera (SN9C202 + MT9V011/MT9V111/MT9V112) +sn9c20x 0c45:62b3 PC Camera (SN9C202 + OV9655) +sn9c20x 0c45:62bb PC Camera (SN9C202 + OV7660) +sn9c20x 0c45:62bc PC Camera (SN9C202 + HV7131R) sunplus 0d64:0303 Sunplus FashionCam DXG etoms 102c:6151 Qcam Sangha CIF etoms 102c:6251 Qcam xxxxxx VGA @@ -290,6 +314,7 @@ spca561 10fd:7e50 FlyCam Usb 100 zc3xx 10fd:8050 Typhoon Webshot II USB 300k ov534 1415:2000 Sony HD Eye for PS3 (SLEH 00201) pac207 145f:013a Trust WB-1300N +sn9c20x 145f:013d Trust WB-3600R vc032x 15b8:6001 HP 2.0 Megapixel vc032x 15b8:6002 HP 2.0 Megapixel rz406aa spca501 1776:501c Arowana 300K CMOS Camera @@ -300,4 +325,11 @@ spca500 2899:012c Toptro Industrial spca508 8086:0110 Intel Easy PC Camera spca500 8086:0630 Intel Pocket PC Camera spca506 99fa:8988 Grandtec V.cap +sn9c20x a168:0610 Dino-Lite Digital Microscope (SN9C201 + HV7131R) +sn9c20x a168:0611 Dino-Lite Digital Microscope (SN9C201 + HV7131R) +sn9c20x a168:0613 Dino-Lite Digital Microscope (SN9C201 + HV7131R) +sn9c20x a168:0618 Dino-Lite Digital Microscope (SN9C201 + HV7131R) +sn9c20x a168:0614 Dino-Lite Digital Microscope (SN9C201 + MT9M111) +sn9c20x a168:0615 Dino-Lite Digital Microscope (SN9C201 + MT9M111) +sn9c20x a168:0617 Dino-Lite Digital Microscope (SN9C201 + MT9M111) spca561 abcd:cdee Petcam -- cgit v1.2.2 From a39ea210ec8c8f6ed381f8dafbe755c57b8f30c3 Mon Sep 17 00:00:00 2001 From: Lucian Adrian Grijincu Date: Mon, 27 Jul 2009 09:06:42 -0700 Subject: driver core: documentation: make it clear that sysfs is optional The original text suggested that sysfs is mandatory and always compiled in the kernel. Signed-off-by: Lucian Adrian Grijincu Signed-off-by: Randy Dunlap Signed-off-by: Greg Kroah-Hartman --- Documentation/filesystems/sysfs.txt | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/filesystems/sysfs.txt b/Documentation/filesystems/sysfs.txt index 7e81e37c0b1e..b245d524d568 100644 --- a/Documentation/filesystems/sysfs.txt +++ b/Documentation/filesystems/sysfs.txt @@ -23,7 +23,8 @@ interface. Using sysfs ~~~~~~~~~~~ -sysfs is always compiled in. You can access it by doing: +sysfs is always compiled in if CONFIG_SYSFS is defined. You can access +it by doing: mount -t sysfs sysfs /sys -- cgit v1.2.2 From cab8bd3410d448279e3bd0fbf96d31db0bf770fa Mon Sep 17 00:00:00 2001 From: Hidetoshi Seto Date: Wed, 29 Jul 2009 15:04:14 -0700 Subject: sysrq, kdump: make sysrq-c consistent commit d6580a9f15238b87e618310c862231ae3f352d2d ("kexec: sysrq: simplify sysrq-c handler") changed the behavior of sysrq-c to unconditional dereference of NULL pointer. So in cases with CONFIG_KEXEC, where crash_kexec() was directly called from sysrq-c before, now it can be said that a step of "real oops" was inserted before starting kdump. However, in contrast to oops via SysRq-c from keyboard which results in panic due to in_interrupt(), oops via "echo c > /proc/sysrq-trigger" will not become panic unless panic_on_oops=1. It means that even if dump is properly configured to be taken on panic, the sysrq-c from proc interface might not start crashdump while the sysrq-c from keyboard can start crashdump. This confuses traditional users of kdump, i.e. people who expect sysrq-c to do common behavior in both of the keyboard and proc interface. This patch brings the keyboard and proc interface behavior of sysrq-c in line, by forcing panic_on_oops=1 before oops in sysrq-c handler. And some updates in documentation are included, to clarify that there is no longer dependency with CONFIG_KEXEC, and that now the system can just crash by sysrq-c if no dump mechanism is configured. Signed-off-by: Hidetoshi Seto Cc: Lai Jiangshan Cc: Ken'ichi Ohmichi Acked-by: Neil Horman Acked-by: Vivek Goyal Cc: Brayan Arraes Cc: Eric W. Biederman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/sysrq.txt | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) (limited to 'Documentation') diff --git a/Documentation/sysrq.txt b/Documentation/sysrq.txt index cf42b820ff9d..d56a01775423 100644 --- a/Documentation/sysrq.txt +++ b/Documentation/sysrq.txt @@ -66,7 +66,8 @@ On all - write a character to /proc/sysrq-trigger. e.g.: 'b' - Will immediately reboot the system without syncing or unmounting your disks. -'c' - Will perform a kexec reboot in order to take a crashdump. +'c' - Will perform a system crash by a NULL pointer dereference. + A crashdump will be taken if configured. 'd' - Shows all locks that are held. @@ -141,8 +142,8 @@ useful when you want to exit a program that will not let you switch consoles. re'B'oot is good when you're unable to shut down. But you should also 'S'ync and 'U'mount first. -'C'rashdump can be used to manually trigger a crashdump when the system is hung. -The kernel needs to have been built with CONFIG_KEXEC enabled. +'C'rash can be used to manually trigger a crashdump when the system is hung. +Note that this just triggers a crash if there is no dump mechanism available. 'S'ync is great when your system is locked up, it allows you to sync your disks and will certainly lessen the chance of data loss and fscking. Note -- cgit v1.2.2 From 8ef562d112c82ec539775698f8b63ac5ec1bd766 Mon Sep 17 00:00:00 2001 From: Rusty Russell Date: Thu, 30 Jul 2009 16:03:43 -0600 Subject: lguest: fix descriptor corruption in example launcher 1d589bb16b825b3a7b4edd34d997f1f1f953033d "Add serial number support for virtio_blk, V4a" extended 'struct virtio_blk_config' to 536 bytes. Lguest and S/390 both use an 8 bit value for the feature length, and this change broke them (if the code is naive). Signed-off-by: Rusty Russell Cc: John Cooper Cc: Christian Borntraeger --- Documentation/lguest/lguest.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/lguest/lguest.c b/Documentation/lguest/lguest.c index 9ebcd6ef361b..45d7d6dcae7a 100644 --- a/Documentation/lguest/lguest.c +++ b/Documentation/lguest/lguest.c @@ -1105,6 +1105,9 @@ static void set_config(struct device *dev, unsigned len, const void *conf) /* Copy in the config information, and store the length. */ memcpy(device_config(dev), conf, len); dev->desc->config_len = len; + + /* Size must fit in config_len field (8 bits)! */ + assert(dev->desc->config_len == len); } /* This routine does all the creation and setup of a new device, including @@ -1515,7 +1518,8 @@ static void setup_block_file(const char *filename) add_feature(dev, VIRTIO_BLK_F_SEG_MAX); conf.seg_max = cpu_to_le32(VIRTQUEUE_NUM - 2); - set_config(dev, sizeof(conf), &conf); + /* Don't try to put whole struct: we have 8 bit limit. */ + set_config(dev, offsetof(struct virtio_blk_config, geometry), &conf); verbose("device %u: virtblock %llu sectors\n", ++devices.device_num, le64_to_cpu(conf.capacity)); -- cgit v1.2.2 From 2e04ef76916d1e29a077ea9d0f2003c8fd86724d Mon Sep 17 00:00:00 2001 From: Rusty Russell Date: Thu, 30 Jul 2009 16:03:45 -0600 Subject: lguest: fix comment style I don't really notice it (except to begrudge the extra vertical space), but Ingo does. And he pointed out that one excuse of lguest is as a teaching tool, it should set a good example. Signed-off-by: Rusty Russell Cc: Ingo Molnar --- Documentation/lguest/lguest.c | 540 +++++++++++++++++++++++++++--------------- 1 file changed, 349 insertions(+), 191 deletions(-) (limited to 'Documentation') diff --git a/Documentation/lguest/lguest.c b/Documentation/lguest/lguest.c index 45d7d6dcae7a..aa66a52b73e9 100644 --- a/Documentation/lguest/lguest.c +++ b/Documentation/lguest/lguest.c @@ -1,7 +1,9 @@ -/*P:100 This is the Launcher code, a simple program which lays out the - * "physical" memory for the new Guest by mapping the kernel image and - * the virtual devices, then opens /dev/lguest to tell the kernel - * about the Guest and control it. :*/ +/*P:100 + * This is the Launcher code, a simple program which lays out the "physical" + * memory for the new Guest by mapping the kernel image and the virtual + * devices, then opens /dev/lguest to tell the kernel about the Guest and + * control it. +:*/ #define _LARGEFILE64_SOURCE #define _GNU_SOURCE #include @@ -46,13 +48,15 @@ #include "linux/virtio_rng.h" #include "linux/virtio_ring.h" #include "asm/bootparam.h" -/*L:110 We can ignore the 39 include files we need for this program, but I do - * want to draw attention to the use of kernel-style types. +/*L:110 + * We can ignore the 39 include files we need for this program, but I do want + * to draw attention to the use of kernel-style types. * * As Linus said, "C is a Spartan language, and so should your naming be." I * like these abbreviations, so we define them here. Note that u64 is always * unsigned long long, which works on all Linux systems: this means that we can - * use %llu in printf for any u64. */ + * use %llu in printf for any u64. + */ typedef unsigned long long u64; typedef uint32_t u32; typedef uint16_t u16; @@ -69,8 +73,10 @@ typedef uint8_t u8; /* This will occupy 3 pages: it must be a power of 2. */ #define VIRTQUEUE_NUM 256 -/*L:120 verbose is both a global flag and a macro. The C preprocessor allows - * this, and although I wouldn't recommend it, it works quite nicely here. */ +/*L:120 + * verbose is both a global flag and a macro. The C preprocessor allows + * this, and although I wouldn't recommend it, it works quite nicely here. + */ static bool verbose; #define verbose(args...) \ do { if (verbose) printf(args); } while(0) @@ -100,8 +106,7 @@ struct device_list /* A single linked list of devices. */ struct device *dev; - /* And a pointer to the last device for easy append and also for - * configuration appending. */ + /* And a pointer to the last device for easy append. */ struct device *lastdev; }; @@ -168,20 +173,24 @@ static char **main_args; /* The original tty settings to restore on exit. */ static struct termios orig_term; -/* We have to be careful with barriers: our devices are all run in separate +/* + * We have to be careful with barriers: our devices are all run in separate * threads and so we need to make sure that changes visible to the Guest happen - * in precise order. */ + * in precise order. + */ #define wmb() __asm__ __volatile__("" : : : "memory") #define mb() __asm__ __volatile__("" : : : "memory") -/* Convert an iovec element to the given type. +/* + * Convert an iovec element to the given type. * * This is a fairly ugly trick: we need to know the size of the type and * alignment requirement to check the pointer is kosher. It's also nice to * have the name of the type in case we report failure. * * Typing those three things all the time is cumbersome and error prone, so we - * have a macro which sets them all up and passes to the real function. */ + * have a macro which sets them all up and passes to the real function. + */ #define convert(iov, type) \ ((type *)_convert((iov), sizeof(type), __alignof__(type), #type)) @@ -198,8 +207,10 @@ static void *_convert(struct iovec *iov, size_t size, size_t align, /* Wrapper for the last available index. Makes it easier to change. */ #define lg_last_avail(vq) ((vq)->last_avail_idx) -/* The virtio configuration space is defined to be little-endian. x86 is - * little-endian too, but it's nice to be explicit so we have these helpers. */ +/* + * The virtio configuration space is defined to be little-endian. x86 is + * little-endian too, but it's nice to be explicit so we have these helpers. + */ #define cpu_to_le16(v16) (v16) #define cpu_to_le32(v32) (v32) #define cpu_to_le64(v64) (v64) @@ -241,11 +252,12 @@ static u8 *get_feature_bits(struct device *dev) + dev->num_vq * sizeof(struct lguest_vqconfig); } -/*L:100 The Launcher code itself takes us out into userspace, that scary place - * where pointers run wild and free! Unfortunately, like most userspace - * programs, it's quite boring (which is why everyone likes to hack on the - * kernel!). Perhaps if you make up an Lguest Drinking Game at this point, it - * will get you through this section. Or, maybe not. +/*L:100 + * The Launcher code itself takes us out into userspace, that scary place where + * pointers run wild and free! Unfortunately, like most userspace programs, + * it's quite boring (which is why everyone likes to hack on the kernel!). + * Perhaps if you make up an Lguest Drinking Game at this point, it will get + * you through this section. Or, maybe not. * * The Launcher sets up a big chunk of memory to be the Guest's "physical" * memory and stores it in "guest_base". In other words, Guest physical == @@ -253,7 +265,8 @@ static u8 *get_feature_bits(struct device *dev) * * This can be tough to get your head around, but usually it just means that we * use these trivial conversion functions when the Guest gives us it's - * "physical" addresses: */ + * "physical" addresses: + */ static void *from_guest_phys(unsigned long addr) { return guest_base + addr; @@ -268,7 +281,8 @@ static unsigned long to_guest_phys(const void *addr) * Loading the Kernel. * * We start with couple of simple helper routines. open_or_die() avoids - * error-checking code cluttering the callers: */ + * error-checking code cluttering the callers: + */ static int open_or_die(const char *name, int flags) { int fd = open(name, flags); @@ -283,8 +297,10 @@ static void *map_zeroed_pages(unsigned int num) int fd = open_or_die("/dev/zero", O_RDONLY); void *addr; - /* We use a private mapping (ie. if we write to the page, it will be - * copied). */ + /* + * We use a private mapping (ie. if we write to the page, it will be + * copied). + */ addr = mmap(NULL, getpagesize() * num, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE, fd, 0); if (addr == MAP_FAILED) @@ -305,20 +321,24 @@ static void *get_pages(unsigned int num) return addr; } -/* This routine is used to load the kernel or initrd. It tries mmap, but if +/* + * This routine is used to load the kernel or initrd. It tries mmap, but if * that fails (Plan 9's kernel file isn't nicely aligned on page boundaries), - * it falls back to reading the memory in. */ + * it falls back to reading the memory in. + */ static void map_at(int fd, void *addr, unsigned long offset, unsigned long len) { ssize_t r; - /* We map writable even though for some segments are marked read-only. + /* + * We map writable even though for some segments are marked read-only. * The kernel really wants to be writable: it patches its own * instructions. * * MAP_PRIVATE means that the page won't be copied until a write is * done to it. This allows us to share untouched memory between - * Guests. */ + * Guests. + */ if (mmap(addr, len, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_FIXED|MAP_PRIVATE, fd, offset) != MAP_FAILED) return; @@ -329,7 +349,8 @@ static void map_at(int fd, void *addr, unsigned long offset, unsigned long len) err(1, "Reading offset %lu len %lu gave %zi", offset, len, r); } -/* This routine takes an open vmlinux image, which is in ELF, and maps it into +/* + * This routine takes an open vmlinux image, which is in ELF, and maps it into * the Guest memory. ELF = Embedded Linking Format, which is the format used * by all modern binaries on Linux including the kernel. * @@ -337,23 +358,28 @@ static void map_at(int fd, void *addr, unsigned long offset, unsigned long len) * address. We use the physical address; the Guest will map itself to the * virtual address. * - * We return the starting address. */ + * We return the starting address. + */ static unsigned long map_elf(int elf_fd, const Elf32_Ehdr *ehdr) { Elf32_Phdr phdr[ehdr->e_phnum]; unsigned int i; - /* Sanity checks on the main ELF header: an x86 executable with a - * reasonable number of correctly-sized program headers. */ + /* + * Sanity checks on the main ELF header: an x86 executable with a + * reasonable number of correctly-sized program headers. + */ if (ehdr->e_type != ET_EXEC || ehdr->e_machine != EM_386 || ehdr->e_phentsize != sizeof(Elf32_Phdr) || ehdr->e_phnum < 1 || ehdr->e_phnum > 65536U/sizeof(Elf32_Phdr)) errx(1, "Malformed elf header"); - /* An ELF executable contains an ELF header and a number of "program" + /* + * An ELF executable contains an ELF header and a number of "program" * headers which indicate which parts ("segments") of the program to - * load where. */ + * load where. + */ /* We read in all the program headers at once: */ if (lseek(elf_fd, ehdr->e_phoff, SEEK_SET) < 0) @@ -361,8 +387,10 @@ static unsigned long map_elf(int elf_fd, const Elf32_Ehdr *ehdr) if (read(elf_fd, phdr, sizeof(phdr)) != sizeof(phdr)) err(1, "Reading program headers"); - /* Try all the headers: there are usually only three. A read-only one, - * a read-write one, and a "note" section which we don't load. */ + /* + * Try all the headers: there are usually only three. A read-only one, + * a read-write one, and a "note" section which we don't load. + */ for (i = 0; i < ehdr->e_phnum; i++) { /* If this isn't a loadable segment, we ignore it */ if (phdr[i].p_type != PT_LOAD) @@ -380,13 +408,15 @@ static unsigned long map_elf(int elf_fd, const Elf32_Ehdr *ehdr) return ehdr->e_entry; } -/*L:150 A bzImage, unlike an ELF file, is not meant to be loaded. You're - * supposed to jump into it and it will unpack itself. We used to have to - * perform some hairy magic because the unpacking code scared me. +/*L:150 + * A bzImage, unlike an ELF file, is not meant to be loaded. You're supposed + * to jump into it and it will unpack itself. We used to have to perform some + * hairy magic because the unpacking code scared me. * * Fortunately, Jeremy Fitzhardinge convinced me it wasn't that hard and wrote * a small patch to jump over the tricky bits in the Guest, so now we just read - * the funky header so we know where in the file to load, and away we go! */ + * the funky header so we know where in the file to load, and away we go! + */ static unsigned long load_bzimage(int fd) { struct boot_params boot; @@ -394,8 +424,10 @@ static unsigned long load_bzimage(int fd) /* Modern bzImages get loaded at 1M. */ void *p = from_guest_phys(0x100000); - /* Go back to the start of the file and read the header. It should be - * a Linux boot header (see Documentation/x86/i386/boot.txt) */ + /* + * Go back to the start of the file and read the header. It should be + * a Linux boot header (see Documentation/x86/i386/boot.txt) + */ lseek(fd, 0, SEEK_SET); read(fd, &boot, sizeof(boot)); @@ -414,9 +446,11 @@ static unsigned long load_bzimage(int fd) return boot.hdr.code32_start; } -/*L:140 Loading the kernel is easy when it's a "vmlinux", but most kernels +/*L:140 + * Loading the kernel is easy when it's a "vmlinux", but most kernels * come wrapped up in the self-decompressing "bzImage" format. With a little - * work, we can load those, too. */ + * work, we can load those, too. + */ static unsigned long load_kernel(int fd) { Elf32_Ehdr hdr; @@ -433,24 +467,28 @@ static unsigned long load_kernel(int fd) return load_bzimage(fd); } -/* This is a trivial little helper to align pages. Andi Kleen hated it because +/* + * This is a trivial little helper to align pages. Andi Kleen hated it because * it calls getpagesize() twice: "it's dumb code." * * Kernel guys get really het up about optimization, even when it's not - * necessary. I leave this code as a reaction against that. */ + * necessary. I leave this code as a reaction against that. + */ static inline unsigned long page_align(unsigned long addr) { /* Add upwards and truncate downwards. */ return ((addr + getpagesize()-1) & ~(getpagesize()-1)); } -/*L:180 An "initial ram disk" is a disk image loaded into memory along with - * the kernel which the kernel can use to boot from without needing any - * drivers. Most distributions now use this as standard: the initrd contains - * the code to load the appropriate driver modules for the current machine. +/*L:180 + * An "initial ram disk" is a disk image loaded into memory along with the + * kernel which the kernel can use to boot from without needing any drivers. + * Most distributions now use this as standard: the initrd contains the code to + * load the appropriate driver modules for the current machine. * * Importantly, James Morris works for RedHat, and Fedora uses initrds for its - * kernels. He sent me this (and tells me when I break it). */ + * kernels. He sent me this (and tells me when I break it). + */ static unsigned long load_initrd(const char *name, unsigned long mem) { int ifd; @@ -462,12 +500,16 @@ static unsigned long load_initrd(const char *name, unsigned long mem) if (fstat(ifd, &st) < 0) err(1, "fstat() on initrd '%s'", name); - /* We map the initrd at the top of memory, but mmap wants it to be - * page-aligned, so we round the size up for that. */ + /* + * We map the initrd at the top of memory, but mmap wants it to be + * page-aligned, so we round the size up for that. + */ len = page_align(st.st_size); map_at(ifd, from_guest_phys(mem - len), 0, st.st_size); - /* Once a file is mapped, you can close the file descriptor. It's a - * little odd, but quite useful. */ + /* + * Once a file is mapped, you can close the file descriptor. It's a + * little odd, but quite useful. + */ close(ifd); verbose("mapped initrd %s size=%lu @ %p\n", name, len, (void*)mem-len); @@ -476,8 +518,10 @@ static unsigned long load_initrd(const char *name, unsigned long mem) } /*:*/ -/* Simple routine to roll all the commandline arguments together with spaces - * between them. */ +/* + * Simple routine to roll all the commandline arguments together with spaces + * between them. + */ static void concat(char *dst, char *args[]) { unsigned int i, len = 0; @@ -494,10 +538,12 @@ static void concat(char *dst, char *args[]) dst[len] = '\0'; } -/*L:185 This is where we actually tell the kernel to initialize the Guest. We +/*L:185 + * This is where we actually tell the kernel to initialize the Guest. We * saw the arguments it expects when we looked at initialize() in lguest_user.c: * the base of Guest "physical" memory, the top physical page to allow and the - * entry point for the Guest. */ + * entry point for the Guest. + */ static void tell_kernel(unsigned long start) { unsigned long args[] = { LHREQ_INITIALIZE, @@ -522,20 +568,26 @@ static void tell_kernel(unsigned long start) static void *_check_pointer(unsigned long addr, unsigned int size, unsigned int line) { - /* We have to separately check addr and addr+size, because size could - * be huge and addr + size might wrap around. */ + /* + * We have to separately check addr and addr+size, because size could + * be huge and addr + size might wrap around. + */ if (addr >= guest_limit || addr + size >= guest_limit) errx(1, "%s:%i: Invalid address %#lx", __FILE__, line, addr); - /* We return a pointer for the caller's convenience, now we know it's - * safe to use. */ + /* + * We return a pointer for the caller's convenience, now we know it's + * safe to use. + */ return from_guest_phys(addr); } /* A macro which transparently hands the line number to the real function. */ #define check_pointer(addr,size) _check_pointer(addr, size, __LINE__) -/* Each buffer in the virtqueues is actually a chain of descriptors. This +/* + * Each buffer in the virtqueues is actually a chain of descriptors. This * function returns the next descriptor in the chain, or vq->vring.num if we're - * at the end. */ + * at the end. + */ static unsigned next_desc(struct vring_desc *desc, unsigned int i, unsigned int max) { @@ -576,12 +628,14 @@ static void trigger_irq(struct virtqueue *vq) err(1, "Triggering irq %i", vq->config.irq); } -/* This looks in the virtqueue and for the first available buffer, and converts +/* + * This looks in the virtqueue and for the first available buffer, and converts * it to an iovec for convenient access. Since descriptors consist of some * number of output then some number of input descriptors, it's actually two * iovecs, but we pack them into one and note how many of each there were. * - * This function returns the descriptor number found. */ + * This function returns the descriptor number found. + */ static unsigned wait_for_vq_desc(struct virtqueue *vq, struct iovec iov[], unsigned int *out_num, unsigned int *in_num) @@ -599,8 +653,10 @@ static unsigned wait_for_vq_desc(struct virtqueue *vq, /* OK, now we need to know about added descriptors. */ vq->vring.used->flags &= ~VRING_USED_F_NO_NOTIFY; - /* They could have slipped one in as we were doing that: make - * sure it's written, then check again. */ + /* + * They could have slipped one in as we were doing that: make + * sure it's written, then check again. + */ mb(); if (last_avail != vq->vring.avail->idx) { vq->vring.used->flags |= VRING_USED_F_NO_NOTIFY; @@ -620,8 +676,10 @@ static unsigned wait_for_vq_desc(struct virtqueue *vq, errx(1, "Guest moved used index from %u to %u", last_avail, vq->vring.avail->idx); - /* Grab the next descriptor number they're advertising, and increment - * the index we've seen. */ + /* + * Grab the next descriptor number they're advertising, and increment + * the index we've seen. + */ head = vq->vring.avail->ring[last_avail % vq->vring.num]; lg_last_avail(vq)++; @@ -636,8 +694,10 @@ static unsigned wait_for_vq_desc(struct virtqueue *vq, desc = vq->vring.desc; i = head; - /* If this is an indirect entry, then this buffer contains a descriptor - * table which we handle as if it's any normal descriptor chain. */ + /* + * If this is an indirect entry, then this buffer contains a descriptor + * table which we handle as if it's any normal descriptor chain. + */ if (desc[i].flags & VRING_DESC_F_INDIRECT) { if (desc[i].len % sizeof(struct vring_desc)) errx(1, "Invalid size for indirect buffer table"); @@ -656,8 +716,10 @@ static unsigned wait_for_vq_desc(struct virtqueue *vq, if (desc[i].flags & VRING_DESC_F_WRITE) (*in_num)++; else { - /* If it's an output descriptor, they're all supposed - * to come before any input descriptors. */ + /* + * If it's an output descriptor, they're all supposed + * to come before any input descriptors. + */ if (*in_num) errx(1, "Descriptor has out after in"); (*out_num)++; @@ -671,14 +733,18 @@ static unsigned wait_for_vq_desc(struct virtqueue *vq, return head; } -/* After we've used one of their buffers, we tell them about it. We'll then - * want to send them an interrupt, using trigger_irq(). */ +/* + * After we've used one of their buffers, we tell them about it. We'll then + * want to send them an interrupt, using trigger_irq(). + */ static void add_used(struct virtqueue *vq, unsigned int head, int len) { struct vring_used_elem *used; - /* The virtqueue contains a ring of used buffers. Get a pointer to the - * next entry in that used ring. */ + /* + * The virtqueue contains a ring of used buffers. Get a pointer to the + * next entry in that used ring. + */ used = &vq->vring.used->ring[vq->vring.used->idx % vq->vring.num]; used->id = head; used->len = len; @@ -698,7 +764,8 @@ static void add_used_and_trigger(struct virtqueue *vq, unsigned head, int len) /* * The Console * - * We associate some data with the console for our exit hack. */ + * We associate some data with the console for our exit hack. + */ struct console_abort { /* How many times have they hit ^C? */ @@ -725,20 +792,24 @@ static void console_input(struct virtqueue *vq) if (len <= 0) { /* Ran out of input? */ warnx("Failed to get console input, ignoring console."); - /* For simplicity, dying threads kill the whole Launcher. So - * just nap here. */ + /* + * For simplicity, dying threads kill the whole Launcher. So + * just nap here. + */ for (;;) pause(); } add_used_and_trigger(vq, head, len); - /* Three ^C within one second? Exit. + /* + * Three ^C within one second? Exit. * * This is such a hack, but works surprisingly well. Each ^C has to * be in a buffer by itself, so they can't be too fast. But we check * that we get three within about a second, so they can't be too - * slow. */ + * slow. + */ if (len != 1 || ((char *)iov[0].iov_base)[0] != 3) { abort->count = 0; return; @@ -809,8 +880,7 @@ static bool will_block(int fd) return select(fd+1, &fdset, NULL, NULL, &zero) != 1; } -/* This is where we handle packets coming in from the tun device to our - * Guest. */ +/* This handles packets coming in from the tun device to our Guest. */ static void net_input(struct virtqueue *vq) { int len; @@ -842,8 +912,10 @@ static int do_thread(void *_vq) return 0; } -/* When a child dies, we kill our entire process group with SIGTERM. This - * also has the side effect that the shell restores the console for us! */ +/* + * When a child dies, we kill our entire process group with SIGTERM. This + * also has the side effect that the shell restores the console for us! + */ static void kill_launcher(int signal) { kill(0, SIGTERM); @@ -880,9 +952,10 @@ static void reset_device(struct device *dev) static void create_thread(struct virtqueue *vq) { - /* Create stack for thread and run it. Since stack grows - * upwards, we point the stack pointer to the end of this - * region. */ + /* + * Create stack for thread and run it. Since the stack grows upwards, + * we point the stack pointer to the end of this region. + */ char *stack = malloc(32768); unsigned long args[] = { LHREQ_EVENTFD, vq->config.pfn*getpagesize(), 0 }; @@ -981,8 +1054,11 @@ static void handle_output(unsigned long addr) } } - /* Early console write is done using notify on a nul-terminated string - * in Guest memory. */ + /* + * Early console write is done using notify on a nul-terminated string + * in Guest memory. It's also great for hacking debugging messages + * into a Guest. + */ if (addr >= guest_limit) errx(1, "Bad NOTIFY %#lx", addr); @@ -998,10 +1074,12 @@ static void handle_output(unsigned long addr) * routines to allocate and manage them. */ -/* The layout of the device page is a "struct lguest_device_desc" followed by a +/* + * The layout of the device page is a "struct lguest_device_desc" followed by a * number of virtqueue descriptors, then two sets of feature bits, then an * array of configuration bytes. This routine returns the configuration - * pointer. */ + * pointer. + */ static u8 *device_config(const struct device *dev) { return (void *)(dev->desc + 1) @@ -1009,9 +1087,11 @@ static u8 *device_config(const struct device *dev) + dev->feature_len * 2; } -/* This routine allocates a new "struct lguest_device_desc" from descriptor +/* + * This routine allocates a new "struct lguest_device_desc" from descriptor * table page just above the Guest's normal memory. It returns a pointer to - * that descriptor. */ + * that descriptor. + */ static struct lguest_device_desc *new_dev_desc(u16 type) { struct lguest_device_desc d = { .type = type }; @@ -1032,8 +1112,10 @@ static struct lguest_device_desc *new_dev_desc(u16 type) return memcpy(p, &d, sizeof(d)); } -/* Each device descriptor is followed by the description of its virtqueues. We - * specify how many descriptors the virtqueue is to have. */ +/* + * Each device descriptor is followed by the description of its virtqueues. We + * specify how many descriptors the virtqueue is to have. + */ static void add_virtqueue(struct device *dev, unsigned int num_descs, void (*service)(struct virtqueue *)) { @@ -1061,10 +1143,12 @@ static void add_virtqueue(struct device *dev, unsigned int num_descs, /* Initialize the vring. */ vring_init(&vq->vring, num_descs, p, LGUEST_VRING_ALIGN); - /* Append virtqueue to this device's descriptor. We use + /* + * Append virtqueue to this device's descriptor. We use * device_config() to get the end of the device's current virtqueues; * we check that we haven't added any config or feature information - * yet, otherwise we'd be overwriting them. */ + * yet, otherwise we'd be overwriting them. + */ assert(dev->desc->config_len == 0 && dev->desc->feature_len == 0); memcpy(device_config(dev), &vq->config, sizeof(vq->config)); dev->num_vq++; @@ -1072,14 +1156,18 @@ static void add_virtqueue(struct device *dev, unsigned int num_descs, verbose("Virtqueue page %#lx\n", to_guest_phys(p)); - /* Add to tail of list, so dev->vq is first vq, dev->vq->next is - * second. */ + /* + * Add to tail of list, so dev->vq is first vq, dev->vq->next is + * second. + */ for (i = &dev->vq; *i; i = &(*i)->next); *i = vq; } -/* The first half of the feature bitmask is for us to advertise features. The - * second half is for the Guest to accept features. */ +/* + * The first half of the feature bitmask is for us to advertise features. The + * second half is for the Guest to accept features. + */ static void add_feature(struct device *dev, unsigned bit) { u8 *features = get_feature_bits(dev); @@ -1093,9 +1181,11 @@ static void add_feature(struct device *dev, unsigned bit) features[bit / CHAR_BIT] |= (1 << (bit % CHAR_BIT)); } -/* This routine sets the configuration fields for an existing device's +/* + * This routine sets the configuration fields for an existing device's * descriptor. It only works for the last device, but that's OK because that's - * how we use it. */ + * how we use it. + */ static void set_config(struct device *dev, unsigned len, const void *conf) { /* Check we haven't overflowed our single page. */ @@ -1110,10 +1200,12 @@ static void set_config(struct device *dev, unsigned len, const void *conf) assert(dev->desc->config_len == len); } -/* This routine does all the creation and setup of a new device, including +/* + * This routine does all the creation and setup of a new device, including * calling new_dev_desc() to allocate the descriptor and device memory. * - * See what I mean about userspace being boring? */ + * See what I mean about userspace being boring? + */ static struct device *new_device(const char *name, u16 type) { struct device *dev = malloc(sizeof(*dev)); @@ -1126,10 +1218,12 @@ static struct device *new_device(const char *name, u16 type) dev->num_vq = 0; dev->running = false; - /* Append to device list. Prepending to a single-linked list is + /* + * Append to device list. Prepending to a single-linked list is * easier, but the user expects the devices to be arranged on the bus * in command-line order. The first network device on the command line - * is eth0, the first block device /dev/vda, etc. */ + * is eth0, the first block device /dev/vda, etc. + */ if (devices.lastdev) devices.lastdev->next = dev; else @@ -1139,8 +1233,10 @@ static struct device *new_device(const char *name, u16 type) return dev; } -/* Our first setup routine is the console. It's a fairly simple device, but - * UNIX tty handling makes it uglier than it could be. */ +/* + * Our first setup routine is the console. It's a fairly simple device, but + * UNIX tty handling makes it uglier than it could be. + */ static void setup_console(void) { struct device *dev; @@ -1148,8 +1244,10 @@ static void setup_console(void) /* If we can save the initial standard input settings... */ if (tcgetattr(STDIN_FILENO, &orig_term) == 0) { struct termios term = orig_term; - /* Then we turn off echo, line buffering and ^C etc. We want a - * raw input stream to the Guest. */ + /* + * Then we turn off echo, line buffering and ^C etc: We want a + * raw input stream to the Guest. + */ term.c_lflag &= ~(ISIG|ICANON|ECHO); tcsetattr(STDIN_FILENO, TCSANOW, &term); } @@ -1160,10 +1258,12 @@ static void setup_console(void) dev->priv = malloc(sizeof(struct console_abort)); ((struct console_abort *)dev->priv)->count = 0; - /* The console needs two virtqueues: the input then the output. When + /* + * The console needs two virtqueues: the input then the output. When * they put something the input queue, we make sure we're listening to * stdin. When they put something in the output queue, we write it to - * stdout. */ + * stdout. + */ add_virtqueue(dev, VIRTQUEUE_NUM, console_input); add_virtqueue(dev, VIRTQUEUE_NUM, console_output); @@ -1171,7 +1271,8 @@ static void setup_console(void) } /*:*/ -/*M:010 Inter-guest networking is an interesting area. Simplest is to have a +/*M:010 + * Inter-guest networking is an interesting area. Simplest is to have a * --sharenet= option which opens or creates a named pipe. This can be * used to send packets to another guest in a 1:1 manner. * @@ -1185,7 +1286,8 @@ static void setup_console(void) * multiple inter-guest channels behind one interface, although it would * require some manner of hotplugging new virtio channels. * - * Finally, we could implement a virtio network switch in the kernel. :*/ + * Finally, we could implement a virtio network switch in the kernel. +:*/ static u32 str2ip(const char *ipaddr) { @@ -1210,11 +1312,13 @@ static void str2mac(const char *macaddr, unsigned char mac[6]) mac[5] = m[5]; } -/* This code is "adapted" from libbridge: it attaches the Host end of the +/* + * This code is "adapted" from libbridge: it attaches the Host end of the * network device to the bridge device specified by the command line. * * This is yet another James Morris contribution (I'm an IP-level guy, so I - * dislike bridging), and I just try not to break it. */ + * dislike bridging), and I just try not to break it. + */ static void add_to_bridge(int fd, const char *if_name, const char *br_name) { int ifidx; @@ -1234,9 +1338,11 @@ static void add_to_bridge(int fd, const char *if_name, const char *br_name) err(1, "can't add %s to bridge %s", if_name, br_name); } -/* This sets up the Host end of the network device with an IP address, brings +/* + * This sets up the Host end of the network device with an IP address, brings * it up so packets will flow, the copies the MAC address into the hwaddr - * pointer. */ + * pointer. + */ static void configure_device(int fd, const char *tapif, u32 ipaddr) { struct ifreq ifr; @@ -1263,10 +1369,12 @@ static int get_tun_device(char tapif[IFNAMSIZ]) /* Start with this zeroed. Messy but sure. */ memset(&ifr, 0, sizeof(ifr)); - /* We open the /dev/net/tun device and tell it we want a tap device. A + /* + * We open the /dev/net/tun device and tell it we want a tap device. A * tap device is like a tun device, only somehow different. To tell * the truth, I completely blundered my way through this code, but it - * works now! */ + * works now! + */ netfd = open_or_die("/dev/net/tun", O_RDWR); ifr.ifr_flags = IFF_TAP | IFF_NO_PI | IFF_VNET_HDR; strcpy(ifr.ifr_name, "tap%d"); @@ -1277,18 +1385,22 @@ static int get_tun_device(char tapif[IFNAMSIZ]) TUN_F_CSUM|TUN_F_TSO4|TUN_F_TSO6|TUN_F_TSO_ECN) != 0) err(1, "Could not set features for tun device"); - /* We don't need checksums calculated for packets coming in this - * device: trust us! */ + /* + * We don't need checksums calculated for packets coming in this + * device: trust us! + */ ioctl(netfd, TUNSETNOCSUM, 1); memcpy(tapif, ifr.ifr_name, IFNAMSIZ); return netfd; } -/*L:195 Our network is a Host<->Guest network. This can either use bridging or +/*L:195 + * Our network is a Host<->Guest network. This can either use bridging or * routing, but the principle is the same: it uses the "tun" device to inject * packets into the Host as if they came in from a normal network card. We - * just shunt packets between the Guest and the tun device. */ + * just shunt packets between the Guest and the tun device. + */ static void setup_tun_net(char *arg) { struct device *dev; @@ -1305,13 +1417,14 @@ static void setup_tun_net(char *arg) dev = new_device("net", VIRTIO_ID_NET); dev->priv = net_info; - /* Network devices need a receive and a send queue, just like - * console. */ + /* Network devices need a recv and a send queue, just like console. */ add_virtqueue(dev, VIRTQUEUE_NUM, net_input); add_virtqueue(dev, VIRTQUEUE_NUM, net_output); - /* We need a socket to perform the magic network ioctls to bring up the - * tap interface, connect to the bridge etc. Any socket will do! */ + /* + * We need a socket to perform the magic network ioctls to bring up the + * tap interface, connect to the bridge etc. Any socket will do! + */ ipfd = socket(PF_INET, SOCK_DGRAM, IPPROTO_IP); if (ipfd < 0) err(1, "opening IP socket"); @@ -1366,7 +1479,8 @@ static void setup_tun_net(char *arg) devices.device_num, tapif, arg); } -/* Our block (disk) device should be really simple: the Guest asks for a block +/* + * Our block (disk) device should be really simple: the Guest asks for a block * number and we read or write that position in the file. Unfortunately, that * was amazingly slow: the Guest waits until the read is finished before * running anything else, even if it could have been doing useful work. @@ -1374,7 +1488,9 @@ static void setup_tun_net(char *arg) * We could use async I/O, except it's reputed to suck so hard that characters * actually go missing from your code when you try to use it. * - * So we farm the I/O out to thread, and communicate with it via a pipe. */ + * So this was one reason why lguest now does all virtqueue servicing in + * separate threads: it's more efficient and more like a real device. + */ /* This hangs off device->priv. */ struct vblk_info @@ -1412,9 +1528,11 @@ static void blk_request(struct virtqueue *vq) /* Get the next request. */ head = wait_for_vq_desc(vq, iov, &out_num, &in_num); - /* Every block request should contain at least one output buffer + /* + * Every block request should contain at least one output buffer * (detailing the location on disk and the type of request) and one - * input buffer (to hold the result). */ + * input buffer (to hold the result). + */ if (out_num == 0 || in_num == 0) errx(1, "Bad virtblk cmd %u out=%u in=%u", head, out_num, in_num); @@ -1423,33 +1541,41 @@ static void blk_request(struct virtqueue *vq) in = convert(&iov[out_num+in_num-1], u8); off = out->sector * 512; - /* The block device implements "barriers", where the Guest indicates + /* + * The block device implements "barriers", where the Guest indicates * that it wants all previous writes to occur before this write. We * don't have a way of asking our kernel to do a barrier, so we just - * synchronize all the data in the file. Pretty poor, no? */ + * synchronize all the data in the file. Pretty poor, no? + */ if (out->type & VIRTIO_BLK_T_BARRIER) fdatasync(vblk->fd); - /* In general the virtio block driver is allowed to try SCSI commands. - * It'd be nice if we supported eject, for example, but we don't. */ + /* + * In general the virtio block driver is allowed to try SCSI commands. + * It'd be nice if we supported eject, for example, but we don't. + */ if (out->type & VIRTIO_BLK_T_SCSI_CMD) { fprintf(stderr, "Scsi commands unsupported\n"); *in = VIRTIO_BLK_S_UNSUPP; wlen = sizeof(*in); } else if (out->type & VIRTIO_BLK_T_OUT) { - /* Write */ - - /* Move to the right location in the block file. This can fail - * if they try to write past end. */ + /* + * Write + * + * Move to the right location in the block file. This can fail + * if they try to write past end. + */ if (lseek64(vblk->fd, off, SEEK_SET) != off) err(1, "Bad seek to sector %llu", out->sector); ret = writev(vblk->fd, iov+1, out_num-1); verbose("WRITE to sector %llu: %i\n", out->sector, ret); - /* Grr... Now we know how long the descriptor they sent was, we + /* + * Grr... Now we know how long the descriptor they sent was, we * make sure they didn't try to write over the end of the block - * file (possibly extending it). */ + * file (possibly extending it). + */ if (ret > 0 && off + ret > vblk->len) { /* Trim it back to the correct length */ ftruncate64(vblk->fd, vblk->len); @@ -1459,10 +1585,12 @@ static void blk_request(struct virtqueue *vq) wlen = sizeof(*in); *in = (ret >= 0 ? VIRTIO_BLK_S_OK : VIRTIO_BLK_S_IOERR); } else { - /* Read */ - - /* Move to the right location in the block file. This can fail - * if they try to read past end. */ + /* + * Read + * + * Move to the right location in the block file. This can fail + * if they try to read past end. + */ if (lseek64(vblk->fd, off, SEEK_SET) != off) err(1, "Bad seek to sector %llu", out->sector); @@ -1477,10 +1605,12 @@ static void blk_request(struct virtqueue *vq) } } - /* OK, so we noted that it was pretty poor to use an fdatasync as a + /* + * OK, so we noted that it was pretty poor to use an fdatasync as a * barrier. But Christoph Hellwig points out that we need a sync * *afterwards* as well: "Barriers specify no reordering to the front - * or the back." And Jens Axboe confirmed it, so here we are: */ + * or the back." And Jens Axboe confirmed it, so here we are: + */ if (out->type & VIRTIO_BLK_T_BARRIER) fdatasync(vblk->fd); @@ -1494,7 +1624,7 @@ static void setup_block_file(const char *filename) struct vblk_info *vblk; struct virtio_blk_config conf; - /* The device responds to return from I/O thread. */ + /* Creat the device. */ dev = new_device("block", VIRTIO_ID_BLOCK); /* The device has one virtqueue, where the Guest places requests. */ @@ -1513,8 +1643,10 @@ static void setup_block_file(const char *filename) /* Tell Guest how many sectors this device has. */ conf.capacity = cpu_to_le64(vblk->len / 512); - /* Tell Guest not to put in too many descriptors at once: two are used - * for the in and out elements. */ + /* + * Tell Guest not to put in too many descriptors at once: two are used + * for the in and out elements. + */ add_feature(dev, VIRTIO_BLK_F_SEG_MAX); conf.seg_max = cpu_to_le32(VIRTQUEUE_NUM - 2); @@ -1525,16 +1657,18 @@ static void setup_block_file(const char *filename) ++devices.device_num, le64_to_cpu(conf.capacity)); } -struct rng_info { - int rfd; -}; - -/* Our random number generator device reads from /dev/random into the Guest's +/*L:211 + * Our random number generator device reads from /dev/random into the Guest's * input buffers. The usual case is that the Guest doesn't want random numbers * and so has no buffers although /dev/random is still readable, whereas * console is the reverse. * - * The same logic applies, however. */ + * The same logic applies, however. + */ +struct rng_info { + int rfd; +}; + static void rng_input(struct virtqueue *vq) { int len; @@ -1547,9 +1681,11 @@ static void rng_input(struct virtqueue *vq) if (out_num) errx(1, "Output buffers in rng?"); - /* This is why we convert to iovecs: the readv() call uses them, and so + /* + * This is why we convert to iovecs: the readv() call uses them, and so * it reads straight into the Guest's buffer. We loop to make sure we - * fill it. */ + * fill it. + */ while (!iov_empty(iov, in_num)) { len = readv(rng_info->rfd, iov, in_num); if (len <= 0) @@ -1562,15 +1698,18 @@ static void rng_input(struct virtqueue *vq) add_used(vq, head, totlen); } -/* And this creates a "hardware" random number device for the Guest. */ +/*L:199 + * This creates a "hardware" random number device for the Guest. + */ static void setup_rng(void) { struct device *dev; struct rng_info *rng_info = malloc(sizeof(*rng_info)); + /* Our device's privat info simply contains the /dev/random fd. */ rng_info->rfd = open_or_die("/dev/random", O_RDONLY); - /* The device responds to return from I/O thread. */ + /* Create the new device. */ dev = new_device("rng", VIRTIO_ID_RNG); dev->priv = rng_info; @@ -1586,8 +1725,10 @@ static void __attribute__((noreturn)) restart_guest(void) { unsigned int i; - /* Since we don't track all open fds, we simply close everything beyond - * stderr. */ + /* + * Since we don't track all open fds, we simply close everything beyond + * stderr. + */ for (i = 3; i < FD_SETSIZE; i++) close(i); @@ -1598,8 +1739,10 @@ static void __attribute__((noreturn)) restart_guest(void) err(1, "Could not exec %s", main_args[0]); } -/*L:220 Finally we reach the core of the Launcher which runs the Guest, serves - * its input and output, and finally, lays it to rest. */ +/*L:220 + * Finally we reach the core of the Launcher which runs the Guest, serves + * its input and output, and finally, lays it to rest. + */ static void __attribute__((noreturn)) run_guest(void) { for (;;) { @@ -1634,7 +1777,7 @@ static void __attribute__((noreturn)) run_guest(void) * * Are you ready? Take a deep breath and join me in the core of the Host, in * "make Host". - :*/ +:*/ static struct option opts[] = { { "verbose", 0, NULL, 'v' }, @@ -1655,8 +1798,7 @@ static void usage(void) /*L:105 The main routine is where the real work begins: */ int main(int argc, char *argv[]) { - /* Memory, top-level pagetable, code startpoint and size of the - * (optional) initrd. */ + /* Memory, code startpoint and size of the (optional) initrd. */ unsigned long mem = 0, start, initrd_size = 0; /* Two temporaries. */ int i, c; @@ -1668,24 +1810,30 @@ int main(int argc, char *argv[]) /* Save the args: we "reboot" by execing ourselves again. */ main_args = argv; - /* First we initialize the device list. We keep a pointer to the last + /* + * First we initialize the device list. We keep a pointer to the last * device, and the next interrupt number to use for devices (1: - * remember that 0 is used by the timer). */ + * remember that 0 is used by the timer). + */ devices.lastdev = NULL; devices.next_irq = 1; cpu_id = 0; - /* We need to know how much memory so we can set up the device + /* + * We need to know how much memory so we can set up the device * descriptor and memory pages for the devices as we parse the command * line. So we quickly look through the arguments to find the amount - * of memory now. */ + * of memory now. + */ for (i = 1; i < argc; i++) { if (argv[i][0] != '-') { mem = atoi(argv[i]) * 1024 * 1024; - /* We start by mapping anonymous pages over all of + /* + * We start by mapping anonymous pages over all of * guest-physical memory range. This fills it with 0, * and ensures that the Guest won't be killed when it - * tries to access it. */ + * tries to access it. + */ guest_base = map_zeroed_pages(mem / getpagesize() + DEVICE_PAGES); guest_limit = mem; @@ -1718,8 +1866,10 @@ int main(int argc, char *argv[]) usage(); } } - /* After the other arguments we expect memory and kernel image name, - * followed by command line arguments for the kernel. */ + /* + * After the other arguments we expect memory and kernel image name, + * followed by command line arguments for the kernel. + */ if (optind + 2 > argc) usage(); @@ -1737,20 +1887,26 @@ int main(int argc, char *argv[]) /* Map the initrd image if requested (at top of physical memory) */ if (initrd_name) { initrd_size = load_initrd(initrd_name, mem); - /* These are the location in the Linux boot header where the - * start and size of the initrd are expected to be found. */ + /* + * These are the location in the Linux boot header where the + * start and size of the initrd are expected to be found. + */ boot->hdr.ramdisk_image = mem - initrd_size; boot->hdr.ramdisk_size = initrd_size; /* The bootloader type 0xFF means "unknown"; that's OK. */ boot->hdr.type_of_loader = 0xFF; } - /* The Linux boot header contains an "E820" memory map: ours is a - * simple, single region. */ + /* + * The Linux boot header contains an "E820" memory map: ours is a + * simple, single region. + */ boot->e820_entries = 1; boot->e820_map[0] = ((struct e820entry) { 0, mem, E820_RAM }); - /* The boot header contains a command line pointer: we put the command - * line after the boot header. */ + /* + * The boot header contains a command line pointer: we put the command + * line after the boot header. + */ boot->hdr.cmd_line_ptr = to_guest_phys(boot + 1); /* We use a simple helper to copy the arguments separated by spaces. */ concat((char *)(boot + 1), argv+optind+2); @@ -1764,8 +1920,10 @@ int main(int argc, char *argv[]) /* Tell the entry path not to try to reload segment registers. */ boot->hdr.loadflags |= KEEP_SEGMENTS; - /* We tell the kernel to initialize the Guest: this returns the open - * /dev/lguest file descriptor. */ + /* + * We tell the kernel to initialize the Guest: this returns the open + * /dev/lguest file descriptor. + */ tell_kernel(start); /* Ensure that we terminate if a child dies. */ -- cgit v1.2.2 From a91d74a3c4de8115295ee87350c13a329164aaaf Mon Sep 17 00:00:00 2001 From: Rusty Russell Date: Thu, 30 Jul 2009 16:03:45 -0600 Subject: lguest: update commentry Every so often, after code shuffles, I need to go through and unbitrot the Lguest Journey (see drivers/lguest/README). Since we now use RCU in a simple form in one place I took the opportunity to expand that explanation. Signed-off-by: Rusty Russell Cc: Ingo Molnar Cc: Paul McKenney --- Documentation/lguest/lguest.c | 184 +++++++++++++++++++++++++++++++----------- 1 file changed, 139 insertions(+), 45 deletions(-) (limited to 'Documentation') diff --git a/Documentation/lguest/lguest.c b/Documentation/lguest/lguest.c index aa66a52b73e9..45163651b519 100644 --- a/Documentation/lguest/lguest.c +++ b/Documentation/lguest/lguest.c @@ -49,7 +49,7 @@ #include "linux/virtio_ring.h" #include "asm/bootparam.h" /*L:110 - * We can ignore the 39 include files we need for this program, but I do want + * We can ignore the 42 include files we need for this program, but I do want * to draw attention to the use of kernel-style types. * * As Linus said, "C is a Spartan language, and so should your naming be." I @@ -305,6 +305,11 @@ static void *map_zeroed_pages(unsigned int num) PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE, fd, 0); if (addr == MAP_FAILED) err(1, "Mmaping %u pages of /dev/zero", num); + + /* + * One neat mmap feature is that you can close the fd, and it + * stays mapped. + */ close(fd); return addr; @@ -557,7 +562,7 @@ static void tell_kernel(unsigned long start) } /*:*/ -/* +/*L:200 * Device Handling. * * When the Guest gives us a buffer, it sends an array of addresses and sizes. @@ -608,7 +613,10 @@ static unsigned next_desc(struct vring_desc *desc, return next; } -/* This actually sends the interrupt for this virtqueue */ +/* + * This actually sends the interrupt for this virtqueue, if we've used a + * buffer. + */ static void trigger_irq(struct virtqueue *vq) { unsigned long buf[] = { LHREQ_IRQ, vq->config.irq }; @@ -629,12 +637,12 @@ static void trigger_irq(struct virtqueue *vq) } /* - * This looks in the virtqueue and for the first available buffer, and converts + * This looks in the virtqueue for the first available buffer, and converts * it to an iovec for convenient access. Since descriptors consist of some * number of output then some number of input descriptors, it's actually two * iovecs, but we pack them into one and note how many of each there were. * - * This function returns the descriptor number found. + * This function waits if necessary, and returns the descriptor number found. */ static unsigned wait_for_vq_desc(struct virtqueue *vq, struct iovec iov[], @@ -644,10 +652,14 @@ static unsigned wait_for_vq_desc(struct virtqueue *vq, struct vring_desc *desc; u16 last_avail = lg_last_avail(vq); + /* There's nothing available? */ while (last_avail == vq->vring.avail->idx) { u64 event; - /* OK, tell Guest about progress up to now. */ + /* + * Since we're about to sleep, now is a good time to tell the + * Guest about what we've used up to now. + */ trigger_irq(vq); /* OK, now we need to know about added descriptors. */ @@ -734,8 +746,9 @@ static unsigned wait_for_vq_desc(struct virtqueue *vq, } /* - * After we've used one of their buffers, we tell them about it. We'll then - * want to send them an interrupt, using trigger_irq(). + * After we've used one of their buffers, we tell the Guest about it. Sometime + * later we'll want to send them an interrupt using trigger_irq(); note that + * wait_for_vq_desc() does that for us if it has to wait. */ static void add_used(struct virtqueue *vq, unsigned int head, int len) { @@ -782,12 +795,12 @@ static void console_input(struct virtqueue *vq) struct console_abort *abort = vq->dev->priv; struct iovec iov[vq->vring.num]; - /* Make sure there's a descriptor waiting. */ + /* Make sure there's a descriptor available. */ head = wait_for_vq_desc(vq, iov, &out_num, &in_num); if (out_num) errx(1, "Output buffers in console in queue?"); - /* Read it in. */ + /* Read into it. This is where we usually wait. */ len = readv(STDIN_FILENO, iov, in_num); if (len <= 0) { /* Ran out of input? */ @@ -800,6 +813,7 @@ static void console_input(struct virtqueue *vq) pause(); } + /* Tell the Guest we used a buffer. */ add_used_and_trigger(vq, head, len); /* @@ -834,15 +848,23 @@ static void console_output(struct virtqueue *vq) unsigned int head, out, in; struct iovec iov[vq->vring.num]; + /* We usually wait in here, for the Guest to give us something. */ head = wait_for_vq_desc(vq, iov, &out, &in); if (in) errx(1, "Input buffers in console output queue?"); + + /* writev can return a partial write, so we loop here. */ while (!iov_empty(iov, out)) { int len = writev(STDOUT_FILENO, iov, out); if (len <= 0) err(1, "Write to stdout gave %i", len); iov_consume(iov, out, len); } + + /* + * We're finished with that buffer: if we're going to sleep, + * wait_for_vq_desc() will prod the Guest with an interrupt. + */ add_used(vq, head, 0); } @@ -862,15 +884,30 @@ static void net_output(struct virtqueue *vq) unsigned int head, out, in; struct iovec iov[vq->vring.num]; + /* We usually wait in here for the Guest to give us a packet. */ head = wait_for_vq_desc(vq, iov, &out, &in); if (in) errx(1, "Input buffers in net output queue?"); + /* + * Send the whole thing through to /dev/net/tun. It expects the exact + * same format: what a coincidence! + */ if (writev(net_info->tunfd, iov, out) < 0) errx(1, "Write to tun failed?"); + + /* + * Done with that one; wait_for_vq_desc() will send the interrupt if + * all packets are processed. + */ add_used(vq, head, 0); } -/* Will reading from this file descriptor block? */ +/* + * Handling network input is a bit trickier, because I've tried to optimize it. + * + * First we have a helper routine which tells is if from this file descriptor + * (ie. the /dev/net/tun device) will block: + */ static bool will_block(int fd) { fd_set fdset; @@ -880,7 +917,11 @@ static bool will_block(int fd) return select(fd+1, &fdset, NULL, NULL, &zero) != 1; } -/* This handles packets coming in from the tun device to our Guest. */ +/* + * This handles packets coming in from the tun device to our Guest. Like all + * service routines, it gets called again as soon as it returns, so you don't + * see a while(1) loop here. + */ static void net_input(struct virtqueue *vq) { int len; @@ -888,21 +929,38 @@ static void net_input(struct virtqueue *vq) struct iovec iov[vq->vring.num]; struct net_info *net_info = vq->dev->priv; + /* + * Get a descriptor to write an incoming packet into. This will also + * send an interrupt if they're out of descriptors. + */ head = wait_for_vq_desc(vq, iov, &out, &in); if (out) errx(1, "Output buffers in net input queue?"); - /* Deliver interrupt now, since we're about to sleep. */ + /* + * If it looks like we'll block reading from the tun device, send them + * an interrupt. + */ if (vq->pending_used && will_block(net_info->tunfd)) trigger_irq(vq); + /* + * Read in the packet. This is where we normally wait (when there's no + * incoming network traffic). + */ len = readv(net_info->tunfd, iov, in); if (len <= 0) err(1, "Failed to read from tun."); + + /* + * Mark that packet buffer as used, but don't interrupt here. We want + * to wait until we've done as much work as we can. + */ add_used(vq, head, len); } +/*:*/ -/* This is the helper to create threads. */ +/* This is the helper to create threads: run the service routine in a loop. */ static int do_thread(void *_vq) { struct virtqueue *vq = _vq; @@ -950,11 +1008,14 @@ static void reset_device(struct device *dev) signal(SIGCHLD, (void *)kill_launcher); } +/*L:216 + * This actually creates the thread which services the virtqueue for a device. + */ static void create_thread(struct virtqueue *vq) { /* - * Create stack for thread and run it. Since the stack grows upwards, - * we point the stack pointer to the end of this region. + * Create stack for thread. Since the stack grows upwards, we point + * the stack pointer to the end of this region. */ char *stack = malloc(32768); unsigned long args[] = { LHREQ_EVENTFD, @@ -966,17 +1027,22 @@ static void create_thread(struct virtqueue *vq) err(1, "Creating eventfd"); args[2] = vq->eventfd; - /* Attach an eventfd to this virtqueue: it will go off - * when the Guest does an LHCALL_NOTIFY for this vq. */ + /* + * Attach an eventfd to this virtqueue: it will go off when the Guest + * does an LHCALL_NOTIFY for this vq. + */ if (write(lguest_fd, &args, sizeof(args)) != 0) err(1, "Attaching eventfd"); - /* CLONE_VM: because it has to access the Guest memory, and - * SIGCHLD so we get a signal if it dies. */ + /* + * CLONE_VM: because it has to access the Guest memory, and SIGCHLD so + * we get a signal if it dies. + */ vq->thread = clone(do_thread, stack + 32768, CLONE_VM | SIGCHLD, vq); if (vq->thread == (pid_t)-1) err(1, "Creating clone"); - /* We close our local copy, now the child has it. */ + + /* We close our local copy now the child has it. */ close(vq->eventfd); } @@ -1028,7 +1094,10 @@ static void update_device_status(struct device *dev) } } -/* This is the generic routine we call when the Guest uses LHCALL_NOTIFY. */ +/*L:215 + * This is the generic routine we call when the Guest uses LHCALL_NOTIFY. In + * particular, it's used to notify us of device status changes during boot. + */ static void handle_output(unsigned long addr) { struct device *i; @@ -1037,18 +1106,32 @@ static void handle_output(unsigned long addr) for (i = devices.dev; i; i = i->next) { struct virtqueue *vq; - /* Notifications to device descriptors update device status. */ + /* + * Notifications to device descriptors mean they updated the + * device status. + */ if (from_guest_phys(addr) == i->desc) { update_device_status(i); return; } - /* Devices *can* be used before status is set to DRIVER_OK. */ + /* + * Devices *can* be used before status is set to DRIVER_OK. + * The original plan was that they would never do this: they + * would always finish setting up their status bits before + * actually touching the virtqueues. In practice, we allowed + * them to, and they do (eg. the disk probes for partition + * tables as part of initialization). + * + * If we see this, we start the device: once it's running, we + * expect the device to catch all the notifications. + */ for (vq = i->vq; vq; vq = vq->next) { if (addr != vq->config.pfn*getpagesize()) continue; if (i->running) errx(1, "Notification on running %s", i->name); + /* This just calls create_thread() for each virtqueue */ start_device(i); return; } @@ -1132,6 +1215,11 @@ static void add_virtqueue(struct device *dev, unsigned int num_descs, vq->next = NULL; vq->last_avail_idx = 0; vq->dev = dev; + + /* + * This is the routine the service thread will run, and its Process ID + * once it's running. + */ vq->service = service; vq->thread = (pid_t)-1; @@ -1202,7 +1290,8 @@ static void set_config(struct device *dev, unsigned len, const void *conf) /* * This routine does all the creation and setup of a new device, including - * calling new_dev_desc() to allocate the descriptor and device memory. + * calling new_dev_desc() to allocate the descriptor and device memory. We + * don't actually start the service threads until later. * * See what I mean about userspace being boring? */ @@ -1478,19 +1567,7 @@ static void setup_tun_net(char *arg) verbose("device %u: tun %s: %s\n", devices.device_num, tapif, arg); } - -/* - * Our block (disk) device should be really simple: the Guest asks for a block - * number and we read or write that position in the file. Unfortunately, that - * was amazingly slow: the Guest waits until the read is finished before - * running anything else, even if it could have been doing useful work. - * - * We could use async I/O, except it's reputed to suck so hard that characters - * actually go missing from your code when you try to use it. - * - * So this was one reason why lguest now does all virtqueue servicing in - * separate threads: it's more efficient and more like a real device. - */ +/*:*/ /* This hangs off device->priv. */ struct vblk_info @@ -1512,8 +1589,16 @@ struct vblk_info /*L:210 * The Disk * - * Remember that the block device is handled by a separate I/O thread. We head - * straight into the core of that thread here: + * The disk only has one virtqueue, so it only has one thread. It is really + * simple: the Guest asks for a block number and we read or write that position + * in the file. + * + * Before we serviced each virtqueue in a separate thread, that was unacceptably + * slow: the Guest waits until the read is finished before running anything + * else, even if it could have been doing useful work. + * + * We could have used async I/O, except it's reputed to suck so hard that + * characters actually go missing from your code when you try to use it. */ static void blk_request(struct virtqueue *vq) { @@ -1525,7 +1610,10 @@ static void blk_request(struct virtqueue *vq) struct iovec iov[vq->vring.num]; off64_t off; - /* Get the next request. */ + /* + * Get the next request, where we normally wait. It triggers the + * interrupt to acknowledge previously serviced requests (if any). + */ head = wait_for_vq_desc(vq, iov, &out_num, &in_num); /* @@ -1539,6 +1627,10 @@ static void blk_request(struct virtqueue *vq) out = convert(&iov[0], struct virtio_blk_outhdr); in = convert(&iov[out_num+in_num-1], u8); + /* + * For historical reasons, block operations are expressed in 512 byte + * "sectors". + */ off = out->sector * 512; /* @@ -1614,6 +1706,7 @@ static void blk_request(struct virtqueue *vq) if (out->type & VIRTIO_BLK_T_BARRIER) fdatasync(vblk->fd); + /* Finished that request. */ add_used(vq, head, wlen); } @@ -1682,9 +1775,8 @@ static void rng_input(struct virtqueue *vq) errx(1, "Output buffers in rng?"); /* - * This is why we convert to iovecs: the readv() call uses them, and so - * it reads straight into the Guest's buffer. We loop to make sure we - * fill it. + * Just like the console write, we loop to cover the whole iovec. + * In this case, short reads actually happen quite a bit. */ while (!iov_empty(iov, in_num)) { len = readv(rng_info->rfd, iov, in_num); @@ -1818,7 +1910,9 @@ int main(int argc, char *argv[]) devices.lastdev = NULL; devices.next_irq = 1; + /* We're CPU 0. In fact, that's the only CPU possible right now. */ cpu_id = 0; + /* * We need to know how much memory so we can set up the device * descriptor and memory pages for the devices as we parse the command @@ -1926,7 +2020,7 @@ int main(int argc, char *argv[]) */ tell_kernel(start); - /* Ensure that we terminate if a child dies. */ + /* Ensure that we terminate if a device-servicing child dies. */ signal(SIGCHLD, kill_launcher); /* If we exit via err(), this kills all the threads, restores tty. */ -- cgit v1.2.2 From 1842f23c05b6a866be831aa60bc8a8731c58ddd0 Mon Sep 17 00:00:00 2001 From: Rusty Russell Date: Thu, 30 Jul 2009 16:03:46 -0600 Subject: lguest and virtio: cleanup struct definitions to Linux style. I've been doing this for years, and akpm picked me up on it about 12 months ago. lguest partly serves as example code, so let's do it Right. Also, remove two unused fields in struct vblk_info in the example launcher. Signed-off-by: Rusty Russell Cc: Ingo Molnar --- Documentation/lguest/lguest.c | 21 +++++---------------- 1 file changed, 5 insertions(+), 16 deletions(-) (limited to 'Documentation') diff --git a/Documentation/lguest/lguest.c b/Documentation/lguest/lguest.c index 45163651b519..950cde6d6e58 100644 --- a/Documentation/lguest/lguest.c +++ b/Documentation/lguest/lguest.c @@ -93,8 +93,7 @@ static int lguest_fd; static unsigned int __thread cpu_id; /* This is our list of devices. */ -struct device_list -{ +struct device_list { /* Counter to assign interrupt numbers. */ unsigned int next_irq; @@ -114,8 +113,7 @@ struct device_list static struct device_list devices; /* The device structure describes a single device. */ -struct device -{ +struct device { /* The linked-list pointer. */ struct device *next; @@ -140,8 +138,7 @@ struct device }; /* The virtqueue structure describes a queue attached to a device. */ -struct virtqueue -{ +struct virtqueue { struct virtqueue *next; /* Which device owns me. */ @@ -779,8 +776,7 @@ static void add_used_and_trigger(struct virtqueue *vq, unsigned head, int len) * * We associate some data with the console for our exit hack. */ -struct console_abort -{ +struct console_abort { /* How many times have they hit ^C? */ int count; /* When did they start? */ @@ -1570,20 +1566,13 @@ static void setup_tun_net(char *arg) /*:*/ /* This hangs off device->priv. */ -struct vblk_info -{ +struct vblk_info { /* The size of the file. */ off64_t len; /* The file descriptor for the file. */ int fd; - /* IO thread listens on this file descriptor [0]. */ - int workpipe[2]; - - /* IO thread writes to this file descriptor to mark it done, then - * Launcher triggers interrupt to Guest. */ - int done_fd; }; /*L:210 -- cgit v1.2.2 From e624859e7eb6ae2930df3923af73406dc6ccdad8 Mon Sep 17 00:00:00 2001 From: Linus Walleij Date: Mon, 27 Jul 2009 22:11:59 +0100 Subject: ARM: 5624/1: Document cache aliasing region Augment the memory.txt file for ARM to list the cache aliasing region ffff4000-fffffff. Signed-off-by: Linus Walleij Signed-off-by: Russell King --- Documentation/arm/memory.txt | 2 ++ 1 file changed, 2 insertions(+) (limited to 'Documentation') diff --git a/Documentation/arm/memory.txt b/Documentation/arm/memory.txt index 43cb1004d35f..9d58c7c5eddd 100644 --- a/Documentation/arm/memory.txt +++ b/Documentation/arm/memory.txt @@ -21,6 +21,8 @@ ffff8000 ffffffff copy_user_page / clear_user_page use. For SA11xx and Xscale, this is used to setup a minicache mapping. +ffff4000 ffffffff cache aliasing on ARMv6 and later CPUs. + ffff1000 ffff7fff Reserved. Platforms must not use this address range. -- cgit v1.2.2 From 2a8aaacda5097fa92a39948da1b4c6614b6e150e Mon Sep 17 00:00:00 2001 From: Tobias Klauser Date: Thu, 30 Jul 2009 13:10:50 -0700 Subject: docbook: fix printk of ip address Use the %pI4 format string instead of %d.%d.%d.%d and NIPQUAD. Signed-off-by: Tobias Klauser Signed-off-by: Randy Dunlap Signed-off-by: David S. Miller --- Documentation/DocBook/kernel-hacking.tmpl | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/DocBook/kernel-hacking.tmpl b/Documentation/DocBook/kernel-hacking.tmpl index a50d6cd58573..992e67e6be7f 100644 --- a/Documentation/DocBook/kernel-hacking.tmpl +++ b/Documentation/DocBook/kernel-hacking.tmpl @@ -449,8 +449,8 @@ printk(KERN_INFO "i = %u\n", i); -__u32 ipaddress; -printk(KERN_INFO "my ip: %d.%d.%d.%d\n", NIPQUAD(ipaddress)); +__be32 ipaddress; +printk(KERN_INFO "my ip: %pI4\n", &ipaddress); -- cgit v1.2.2 From 7e5f5fb09e6fc657f21816b5a18ba645a913368e Mon Sep 17 00:00:00 2001 From: "Martin K. Petersen" Date: Fri, 31 Jul 2009 11:49:13 -0400 Subject: block: Update topology documentation Update topology comments and sysfs documentation based upon discussions with Neil Brown. Signed-off-by: Martin K. Petersen Signed-off-by: Jens Axboe --- Documentation/ABI/testing/sysfs-block | 37 ++++++++++++++++++++++------------- 1 file changed, 23 insertions(+), 14 deletions(-) (limited to 'Documentation') diff --git a/Documentation/ABI/testing/sysfs-block b/Documentation/ABI/testing/sysfs-block index cbbd3e069945..5f3bedaf8e35 100644 --- a/Documentation/ABI/testing/sysfs-block +++ b/Documentation/ABI/testing/sysfs-block @@ -94,28 +94,37 @@ What: /sys/block//queue/physical_block_size Date: May 2009 Contact: Martin K. Petersen Description: - This is the smallest unit the storage device can write - without resorting to read-modify-write operation. It is - usually the same as the logical block size but may be - bigger. One example is SATA drives with 4KB sectors - that expose a 512-byte logical block size to the - operating system. + This is the smallest unit a physical storage device can + write atomically. It is usually the same as the logical + block size but may be bigger. One example is SATA + drives with 4KB sectors that expose a 512-byte logical + block size to the operating system. For stacked block + devices the physical_block_size variable contains the + maximum physical_block_size of the component devices. What: /sys/block//queue/minimum_io_size Date: April 2009 Contact: Martin K. Petersen Description: - Storage devices may report a preferred minimum I/O size, - which is the smallest request the device can perform - without incurring a read-modify-write penalty. For disk - drives this is often the physical block size. For RAID - arrays it is often the stripe chunk size. + Storage devices may report a granularity or preferred + minimum I/O size which is the smallest request the + device can perform without incurring a performance + penalty. For disk drives this is often the physical + block size. For RAID arrays it is often the stripe + chunk size. A properly aligned multiple of + minimum_io_size is the preferred request size for + workloads where a high number of I/O operations is + desired. What: /sys/block//queue/optimal_io_size Date: April 2009 Contact: Martin K. Petersen Description: Storage devices may report an optimal I/O size, which is - the device's preferred unit of receiving I/O. This is - rarely reported for disk drives. For RAID devices it is - usually the stripe width or the internal block size. + the device's preferred unit for sustained I/O. This is + rarely reported for disk drives. For RAID arrays it is + usually the stripe width or the internal track size. A + properly aligned multiple of optimal_io_size is the + preferred request size for workloads where sustained + throughput is desired. If no optimal I/O size is + reported this file contains 0. -- cgit v1.2.2 From 1f6fc2de9525e34ee93bd392fa046369a8cfbf1e Mon Sep 17 00:00:00 2001 From: Henrique de Moraes Holschuh Date: Sat, 1 Aug 2009 12:04:18 -0300 Subject: thinkpad-acpi: remove dock and bay subdrivers The standard ACPI dock driver can handle the hotplug bays and docks of the ThinkPads just fine (including batteries) as of 2.6.27, and the code in thinkpad-acpi for the dock and bay subdrivers is currently broken anyway... Userspace needs some love to support the two-stage ejection nicely, but it is simple enough to do through udev rules (you don't even need HAL) so this wouldn't justify fixing the dock and bay subdrivers, either. That leaves warm-swap bays (_EJ3) support for thinkpad-acpi, as well as support for the weird dock of the model 570, but since such support has never left the "experimental" stage, it is also not a strong enough reason to find a way to fix this code. Users of ThinkPads with warm-swap bays are urged to request that _EJ3 support be added to the regular ACPI dock driver, if such feature is indeed useful for them. Signed-off-by: Henrique de Moraes Holschuh Signed-off-by: Len Brown --- Documentation/laptops/thinkpad-acpi.txt | 127 -------------------------------- 1 file changed, 127 deletions(-) (limited to 'Documentation') diff --git a/Documentation/laptops/thinkpad-acpi.txt b/Documentation/laptops/thinkpad-acpi.txt index f2296ecedb89..e2ddcdeb61b6 100644 --- a/Documentation/laptops/thinkpad-acpi.txt +++ b/Documentation/laptops/thinkpad-acpi.txt @@ -36,8 +36,6 @@ detailed description): - Bluetooth enable and disable - video output switching, expansion control - ThinkLight on and off - - limited docking and undocking - - UltraBay eject - CMOS/UCMS control - LED control - ACPI sounds @@ -729,131 +727,6 @@ cannot be read or if it is unknown, thinkpad-acpi will report it as "off". It is impossible to know if the status returned through sysfs is valid. -Docking / undocking -- /proc/acpi/ibm/dock ------------------------------------------- - -Docking and undocking (e.g. with the X4 UltraBase) requires some -actions to be taken by the operating system to safely make or break -the electrical connections with the dock. - -The docking feature of this driver generates the following ACPI events: - - ibm/dock GDCK 00000003 00000001 -- eject request - ibm/dock GDCK 00000003 00000002 -- undocked - ibm/dock GDCK 00000000 00000003 -- docked - -NOTE: These events will only be generated if the laptop was docked -when originally booted. This is due to the current lack of support for -hot plugging of devices in the Linux ACPI framework. If the laptop was -booted while not in the dock, the following message is shown in the -logs: - - Mar 17 01:42:34 aero kernel: thinkpad_acpi: dock device not present - -In this case, no dock-related events are generated but the dock and -undock commands described below still work. They can be executed -manually or triggered by Fn key combinations (see the example acpid -configuration files included in the driver tarball package available -on the web site). - -When the eject request button on the dock is pressed, the first event -above is generated. The handler for this event should issue the -following command: - - echo undock > /proc/acpi/ibm/dock - -After the LED on the dock goes off, it is safe to eject the laptop. -Note: if you pressed this key by mistake, go ahead and eject the -laptop, then dock it back in. Otherwise, the dock may not function as -expected. - -When the laptop is docked, the third event above is generated. The -handler for this event should issue the following command to fully -enable the dock: - - echo dock > /proc/acpi/ibm/dock - -The contents of the /proc/acpi/ibm/dock file shows the current status -of the dock, as provided by the ACPI framework. - -The docking support in this driver does not take care of enabling or -disabling any other devices you may have attached to the dock. For -example, a CD drive plugged into the UltraBase needs to be disabled or -enabled separately. See the provided example acpid configuration files -for how this can be accomplished. - -There is no support yet for PCI devices that may be attached to a -docking station, e.g. in the ThinkPad Dock II. The driver currently -does not recognize, enable or disable such devices. This means that -the only docking stations currently supported are the X-series -UltraBase docks and "dumb" port replicators like the Mini Dock (the -latter don't need any ACPI support, actually). - - -UltraBay eject -- /proc/acpi/ibm/bay ------------------------------------- - -Inserting or ejecting an UltraBay device requires some actions to be -taken by the operating system to safely make or break the electrical -connections with the device. - -This feature generates the following ACPI events: - - ibm/bay MSTR 00000003 00000000 -- eject request - ibm/bay MSTR 00000001 00000000 -- eject lever inserted - -NOTE: These events will only be generated if the UltraBay was present -when the laptop was originally booted (on the X series, the UltraBay -is in the dock, so it may not be present if the laptop was undocked). -This is due to the current lack of support for hot plugging of devices -in the Linux ACPI framework. If the laptop was booted without the -UltraBay, the following message is shown in the logs: - - Mar 17 01:42:34 aero kernel: thinkpad_acpi: bay device not present - -In this case, no bay-related events are generated but the eject -command described below still works. It can be executed manually or -triggered by a hot key combination. - -Sliding the eject lever generates the first event shown above. The -handler for this event should take whatever actions are necessary to -shut down the device in the UltraBay (e.g. call idectl), then issue -the following command: - - echo eject > /proc/acpi/ibm/bay - -After the LED on the UltraBay goes off, it is safe to pull out the -device. - -When the eject lever is inserted, the second event above is -generated. The handler for this event should take whatever actions are -necessary to enable the UltraBay device (e.g. call idectl). - -The contents of the /proc/acpi/ibm/bay file shows the current status -of the UltraBay, as provided by the ACPI framework. - -EXPERIMENTAL warm eject support on the 600e/x, A22p and A3x (To use -this feature, you need to supply the experimental=1 parameter when -loading the module): - -These models do not have a button near the UltraBay device to request -a hot eject but rather require the laptop to be put to sleep -(suspend-to-ram) before the bay device is ejected or inserted). -The sequence of steps to eject the device is as follows: - - echo eject > /proc/acpi/ibm/bay - put the ThinkPad to sleep - remove the drive - resume from sleep - cat /proc/acpi/ibm/bay should show that the drive was removed - -On the A3x, both the UltraBay 2000 and UltraBay Plus devices are -supported. Use "eject2" instead of "eject" for the second bay. - -Note: the UltraBay eject support on the 600e/x, A22p and A3x is -EXPERIMENTAL and may not work as expected. USE WITH CAUTION! - - CMOS/UCMS control ----------------- -- cgit v1.2.2