aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/hrtimers.txt
blob: 7620ff735faf9384a87f61ba6bf4ccdc7763b655 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178

hrtimers - subsystem for high-resolution kernel timers
----------------------------------------------------

This patch introduces a new subsystem for high-resolution kernel timers.

One might ask the question: we already have a timer subsystem
(kernel/timers.c), why do we need two timer subsystems? After a lot of
back and forth trying to integrate high-resolution and high-precision
features into the existing timer framework, and after testing various
such high-resolution timer implementations in practice, we came to the
conclusion that the timer wheel code is fundamentally not suitable for
such an approach. We initially didnt believe this ('there must be a way
to solve this'), and spent a considerable effort trying to integrate
things into the timer wheel, but we failed. In hindsight, there are
several reasons why such integration is hard/impossible:

- the forced handling of low-resolution and high-resolution timers in
  the same way leads to a lot of compromises, macro magic and #ifdef
  mess. The timers.c code is very "tightly coded" around jiffies and
  32-bitness assumptions, and has been honed and micro-optimized for a
  relatively narrow use case (jiffies in a relatively narrow HZ range)
  for many years - and thus even small extensions to it easily break
  the wheel concept, leading to even worse compromises. The timer wheel
  code is very good and tight code, there's zero problems with it in its
  current usage - but it is simply not suitable to be extended for
  high-res timers.

- the unpredictable [O(N)] overhead of cascading leads to delays which
  necessiate a more complex handling of high resolution timers, which
  in turn decreases robustness. Such a design still led to rather large
  timing inaccuracies. Cascading is a fundamental property of the timer
  wheel concept, it cannot be 'designed out' without unevitably
  degrading other portions of the timers.c code in an unacceptable way.

- the implementation of the current posix-timer subsystem on top of
  the timer wheel has already introduced a quite complex handling of
  the required readjusting of absolute CLOCK_REALTIME timers at
  settimeofday or NTP time - further underlying our experience by
  example: that the timer wheel data structure is too rigid for high-res
  timers.

- the timer wheel code is most optimal for use cases which can be
  identified as "timeouts". Such timeouts are usually set up to cover
  error conditions in various I/O paths, such as networking and block
  I/O. The vast majority of those timers never expire and are rarely
  recascaded because the expected correct event arrives in time so they
  can be removed from the timer wheel before any further processing of
  them becomes necessary. Thus the users of these timeouts can accept
  the granularity and precision tradeoffs of the timer wheel, and
  largely expect the timer subsystem to have near-zero overhead.
  Accurate timing for them is not a core purpose - in fact most of the
  timeout values used are ad-hoc. For them it is at most a necessary
  evil to guarantee the processing of actual timeout completions
  (because most of the timeouts are deleted before completion), which
  should thus be as cheap and unintrusive as possible.

The primary users of precision timers are user-space applications that
utilize nanosleep, posix-timers and itimer interfaces. Also, in-kernel
users like drivers and subsystems which require precise timed events
(e.g. multimedia) can benefit from the availability of a seperate
high-resolution timer subsystem as well.

While this subsystem does not offer high-resolution clock sources just
yet, the hrtimer subsystem can be easily extended with high-resolution
clock capabilities, and patches for that exist and are maturing quickly.
The increasing demand for realtime and multimedia applications along
with other potential users for precise timers gives another reason to
separate the "timeout" and "precise timer" subsystems.

Another potential benefit is that such a seperation allows even more
special-purpose optimization of the existing timer wheel for the low
resolution and low precision use cases - once the precision-sensitive
APIs are separated from the timer wheel and are migrated over to
hrtimers. E.g. we could decrease the frequency of the timeout subsystem
from 250 Hz to 100 HZ (or even smaller).

hrtimer subsystem implementation details
----------------------------------------

the basic design considerations were:

- simplicity

- data structure not bound to jiffies or any other granularity. All the
  kernel logic works at 64-bit nanoseconds resolution - no compromises.

- simplification of existing, timing related kernel code

another basic requirement was the immediate enqueueing and ordering of
timers at activation time. After looking at several possible solutions
such as radix trees and hashes, we chose the red black tree as the basic
data structure. Rbtrees are available as a library in the kernel and are
used in various performance-critical areas of e.g. memory management and
file systems. The rbtree is solely used for time sorted ordering, while
a separate list is used to give the expiry code fast access to the
queued timers, without having to walk the rbtree.

(This seperate list is also useful for later when we'll introduce
high-resolution clocks, where we need seperate pending and expired
queues while keeping the time-order intact.)

Time-ordered enqueueing is not purely for the purposes of
high-resolution clocks though, it also simplifies the handling of
absolute timers based on a low-resolution CLOCK_REALTIME. The existing
implementation needed to keep an extra list of all armed absolute
CLOCK_REALTIME timers along with complex locking. In case of
settimeofday and NTP, all the timers (!) had to be dequeued, the
time-changing code had to fix them up one by one, and all of them had to
be enqueued again. The time-ordered enqueueing and the storage of the
expiry time in absolute time units removes all this complex and poorly
scaling code from the posix-timer implementation - the clock can simply
be set without having to touch the rbtree. This also makes the handling
of posix-timers simpler in general.

The locking and per-CPU behavior of hrtimers was mostly taken from the
existing timer wheel code, as it is mature and well suited. Sharing code
was not really a win, due to the different data structures. Also, the
hrtimer functions now have clearer behavior and clearer names - such as
hrtimer_try_to_cancel() and hrtimer_cancel() [which are roughly
equivalent to del_timer() and del_timer_sync()] - so there's no direct
1:1 mapping between them on the algorithmical level, and thus no real
potential for code sharing either.

Basic data types: every time value, absolute or relative, is in a
special nanosecond-resolution type: ktime_t. The kernel-internal
representation of ktime_t values and operations is implemented via
macros and inline functions, and can be switched between a "hybrid
union" type and a plain "scalar" 64bit nanoseconds representation (at
compile time). The hybrid union type optimizes time conversions on 32bit
CPUs. This build-time-selectable ktime_t storage format was implemented
to avoid the performance impact of 64-bit multiplications and divisions
on 32bit CPUs. Such operations are frequently necessary to convert
between the storage formats provided by kernel and userspace interfaces
and the internal time format. (See include/linux/ktime.h for further
details.)

hrtimers - rounding of timer values
-----------------------------------

the hrtimer code will round timer events to lower-resolution clocks
because it has to. Otherwise it will do no artificial rounding at all.

one question is, what resolution value should be returned to the user by
the clock_getres() interface. This will return whatever real resolution
a given clock has - be it low-res, high-res, or artificially-low-res.

hrtimers - testing and verification
----------------------------------

We used the high-resolution clock subsystem ontop of hrtimers to verify
the hrtimer implementation details in praxis, and we also ran the posix
timer tests in order to ensure specification compliance. We also ran
tests on low-resolution clocks.

The hrtimer patch converts the following kernel functionality to use
hrtimers:

 - nanosleep
 - itimers
 - posix-timers

The conversion of nanosleep and posix-timers enabled the unification of
nanosleep and clock_nanosleep.

The code was successfully compiled for the following platforms:

 i386, x86_64, ARM, PPC, PPC64, IA64

The code was run-tested on the following platforms:

 i386(UP/SMP), x86_64(UP/SMP), ARM, PPC

hrtimers were also integrated into the -rt tree, along with a
hrtimers-based high-resolution clock implementation, so the hrtimers
code got a healthy amount of testing and use in practice.

	Thomas Gleixner, Ingo Molnar
#n441'>441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479
/*
 * Common code for Palm LD, T5, TX, Z72
 *
 * Copyright (C) 2010-2011 Marek Vasut <marek.vasut@gmail.com>
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License version 2 as
 * published by the Free Software Foundation.
 *
 */

#include <linux/platform_device.h>
#include <linux/delay.h>
#include <linux/irq.h>
#include <linux/gpio_keys.h>
#include <linux/input.h>
#include <linux/pda_power.h>
#include <linux/pwm_backlight.h>
#include <linux/gpio.h>
#include <linux/wm97xx.h>
#include <linux/power_supply.h>
#include <linux/usb/gpio_vbus.h>
#include <linux/regulator/max1586.h>
#include <linux/i2c/pxa-i2c.h>

#include <asm/mach-types.h>
#include <asm/mach/arch.h>
#include <asm/mach/map.h>

#include <mach/pxa27x.h>
#include <mach/audio.h>
#include <mach/mmc.h>
#include <mach/pxafb.h>
#include <mach/irda.h>
#include <mach/udc.h>
#include <mach/palmasoc.h>
#include <mach/palm27x.h>

#include "generic.h"
#include "devices.h"

/******************************************************************************
 * SD/MMC card controller
 ******************************************************************************/
#if defined(CONFIG_MMC_PXA) || defined(CONFIG_MMC_PXA_MODULE)
static struct pxamci_platform_data palm27x_mci_platform_data = {
	.ocr_mask		= MMC_VDD_32_33 | MMC_VDD_33_34,
	.detect_delay_ms	= 200,
};

void __init palm27x_mmc_init(int detect, int ro, int power,
					int power_inverted)
{
	palm27x_mci_platform_data.gpio_card_detect	= detect;
	palm27x_mci_platform_data.gpio_card_ro		= ro;
	palm27x_mci_platform_data.gpio_power		= power;
	palm27x_mci_platform_data.gpio_power_invert	= power_inverted;

	pxa_set_mci_info(&palm27x_mci_platform_data);
}
#endif

/******************************************************************************
 * Power management - standby
 ******************************************************************************/
#if defined(CONFIG_SUSPEND)
void __init palm27x_pm_init(unsigned long str_base)
{
	static const unsigned long resume[] = {
		0xe3a00101,	/* mov	r0,	#0x40000000 */
		0xe380060f,	/* orr	r0, r0, #0x00f00000 */
		0xe590f008,	/* ldr	pc, [r0, #0x08] */
	};

	/*
	 * Copy the bootloader.
	 * NOTE: PalmZ72 uses a different wakeup method!
	 */
	memcpy(phys_to_virt(str_base), resume, sizeof(resume));
}
#endif

/******************************************************************************
 * Framebuffer
 ******************************************************************************/
#if defined(CONFIG_FB_PXA) || defined(CONFIG_FB_PXA_MODULE)
struct pxafb_mode_info palm_320x480_lcd_mode = {
	.pixclock	= 57692,
	.xres		= 320,
	.yres		= 480,
	.bpp		= 16,

	.left_margin	= 32,
	.right_margin	= 1,
	.upper_margin	= 7,
	.lower_margin	= 1,

	.hsync_len	= 4,
	.vsync_len	= 1,
};

struct pxafb_mode_info palm_320x320_lcd_mode = {
	.pixclock	= 115384,
	.xres		= 320,
	.yres		= 320,
	.bpp		= 16,

	.left_margin	= 27,
	.right_margin	= 7,
	.upper_margin	= 7,
	.lower_margin	= 8,

	.hsync_len	= 6,
	.vsync_len	= 1,
};

struct pxafb_mode_info palm_320x320_new_lcd_mode = {
	.pixclock	= 86538,
	.xres		= 320,
	.yres		= 320,
	.bpp		= 16,

	.left_margin	= 20,
	.right_margin	= 8,
	.upper_margin	= 8,
	.lower_margin	= 5,

	.hsync_len	= 4,
	.vsync_len	= 1,
};

static struct pxafb_mach_info palm27x_lcd_screen = {
	.num_modes	= 1,
	.lcd_conn	= LCD_COLOR_TFT_16BPP | LCD_PCLK_EDGE_FALL,
};

static int palm27x_lcd_power;
static void palm27x_lcd_ctl(int on, struct fb_var_screeninfo *info)
{
	gpio_set_value(palm27x_lcd_power, on);
}

void __init palm27x_lcd_init(int power, struct pxafb_mode_info *mode)
{
	palm27x_lcd_screen.modes = mode;

	if (gpio_is_valid(power)) {
		if (!gpio_request(power, "LCD power")) {
			pr_err("Palm27x: failed to claim lcd power gpio!\n");
			return;
		}
		if (!gpio_direction_output(power, 1)) {
			pr_err("Palm27x: lcd power configuration failed!\n");
			return;
		}
		palm27x_lcd_power = power;
		palm27x_lcd_screen.pxafb_lcd_power = palm27x_lcd_ctl;
	}

	pxa_set_fb_info(NULL, &palm27x_lcd_screen);
}
#endif

/******************************************************************************
 * USB Gadget
 ******************************************************************************/
#if	defined(CONFIG_USB_PXA27X) || \
	defined(CONFIG_USB_PXA27X_MODULE)
static struct gpio_vbus_mach_info palm27x_udc_info = {
	.gpio_vbus_inverted	= 1,
};

static struct platform_device palm27x_gpio_vbus = {
	.name	= "gpio-vbus",
	.id	= -1,
	.dev	= {
		.platform_data	= &palm27x_udc_info,
	},
};

void __init palm27x_udc_init(int vbus, int pullup, int vbus_inverted)
{
	palm27x_udc_info.gpio_vbus	= vbus;
	palm27x_udc_info.gpio_pullup	= pullup;

	palm27x_udc_info.gpio_vbus_inverted = vbus_inverted;

	if (!gpio_request(pullup, "USB Pullup")) {
		gpio_direction_output(pullup,
			palm27x_udc_info.gpio_vbus_inverted);
		gpio_free(pullup);
	} else
		return;

	platform_device_register(&palm27x_gpio_vbus);
}
#endif

/******************************************************************************
 * IrDA
 ******************************************************************************/
#if defined(CONFIG_IRDA) || defined(CONFIG_IRDA_MODULE)
static struct pxaficp_platform_data palm27x_ficp_platform_data = {
	.transceiver_cap	= IR_SIRMODE | IR_OFF,
};

void __init palm27x_irda_init(int pwdn)
{
	palm27x_ficp_platform_data.gpio_pwdown = pwdn;
	pxa_set_ficp_info(&palm27x_ficp_platform_data);
}
#endif

/******************************************************************************
 * WM97xx audio, battery
 ******************************************************************************/
#if	defined(CONFIG_TOUCHSCREEN_WM97XX) || \
	defined(CONFIG_TOUCHSCREEN_WM97XX_MODULE)
static struct wm97xx_batt_pdata palm27x_batt_pdata = {
	.batt_aux	= WM97XX_AUX_ID3,
	.temp_aux	= WM97XX_AUX_ID2,
	.charge_gpio	= -1,
	.batt_mult	= 1000,
	.batt_div	= 414,
	.temp_mult	= 1,
	.temp_div	= 1,
	.batt_tech	= POWER_SUPPLY_TECHNOLOGY_LIPO,
	.batt_name	= "main-batt",
};

static struct wm97xx_pdata palm27x_wm97xx_pdata = {
	.batt_pdata	= &palm27x_batt_pdata,
};

static pxa2xx_audio_ops_t palm27x_ac97_pdata = {
	.codec_pdata	= { &palm27x_wm97xx_pdata, },
};

static struct palm27x_asoc_info palm27x_asoc_pdata = {
	.jack_gpio	= -1,
};

static struct platform_device palm27x_asoc = {
	.name = "palm27x-asoc",
	.id   = -1,
	.dev  = {
		.platform_data = &palm27x_asoc_pdata,
	},
};

void __init palm27x_ac97_init(int minv, int maxv, int jack, int reset)
{
	palm27x_ac97_pdata.reset_gpio	= reset;
	palm27x_asoc_pdata.jack_gpio	= jack;

	if (minv < 0 || maxv < 0) {
		palm27x_ac97_pdata.codec_pdata[0] = NULL;
		pxa_set_ac97_info(&palm27x_ac97_pdata);
	} else {
		palm27x_batt_pdata.min_voltage	= minv,
		palm27x_batt_pdata.max_voltage	= maxv,

		pxa_set_ac97_info(&palm27x_ac97_pdata);
		platform_device_register(&palm27x_asoc);
	}
}
#endif

/******************************************************************************
 * Backlight
 ******************************************************************************/
#if defined(CONFIG_BACKLIGHT_PWM) || defined(CONFIG_BACKLIGHT_PWM_MODULE)
static int palm_bl_power;
static int palm_lcd_power;

static int palm27x_backlight_init(struct device *dev)
{
	int ret;

	ret = gpio_request(palm_bl_power, "BL POWER");
	if (ret)
		goto err;
	ret = gpio_direction_output(palm_bl_power, 0);
	if (ret)
		goto err2;

	if (gpio_is_valid(palm_lcd_power)) {
		ret = gpio_request(palm_lcd_power, "LCD POWER");
		if (ret)
			goto err2;
		ret = gpio_direction_output(palm_lcd_power, 0);
		if (ret)
			goto err3;
	}

	return 0;
err3:
	gpio_free(palm_lcd_power);
err2:
	gpio_free(palm_bl_power);
err:
	return ret;
}

static int palm27x_backlight_notify(struct device *dev, int brightness)
{
	gpio_set_value(palm_bl_power, brightness);
	if (gpio_is_valid(palm_lcd_power))
		gpio_set_value(palm_lcd_power, brightness);
	return brightness;
}

static void palm27x_backlight_exit(struct device *dev)
{
	gpio_free(palm_bl_power);
	if (gpio_is_valid(palm_lcd_power))
		gpio_free(palm_lcd_power);
}

static struct platform_pwm_backlight_data palm27x_backlight_data = {
	.pwm_id		= 0,
	.max_brightness	= 0xfe,
	.dft_brightness	= 0x7e,
	.pwm_period_ns	= 3500 * 1024,
	.init		= palm27x_backlight_init,
	.notify		= palm27x_backlight_notify,
	.exit		= palm27x_backlight_exit,
};

static struct platform_device palm27x_backlight = {
	.name	= "pwm-backlight",
	.dev	= {
		.parent		= &pxa27x_device_pwm0.dev,
		.platform_data	= &palm27x_backlight_data,
	},
};

void __init palm27x_pwm_init(int bl, int lcd)
{
	palm_bl_power	= bl;
	palm_lcd_power	= lcd;
	platform_device_register(&palm27x_backlight);
}
#endif

/******************************************************************************
 * Power supply
 ******************************************************************************/
#if defined(CONFIG_PDA_POWER) || defined(CONFIG_PDA_POWER_MODULE)
static int palm_ac_state;
static int palm_usb_state;

static int palm27x_power_supply_init(struct device *dev)
{
	int ret;

	ret = gpio_request(palm_ac_state, "AC state");
	if (ret)
		goto err1;
	ret = gpio_direction_input(palm_ac_state);
	if (ret)
		goto err2;

	if (gpio_is_valid(palm_usb_state)) {
		ret = gpio_request(palm_usb_state, "USB state");
		if (ret)
			goto err2;
		ret = gpio_direction_input(palm_usb_state);
		if (ret)
			goto err3;
	}

	return 0;
err3:
	gpio_free(palm_usb_state);
err2:
	gpio_free(palm_ac_state);
err1:
	return ret;
}

static void palm27x_power_supply_exit(struct device *dev)
{
	gpio_free(palm_usb_state);
	gpio_free(palm_ac_state);
}

static int palm27x_is_ac_online(void)
{
	return gpio_get_value(palm_ac_state);
}

static int palm27x_is_usb_online(void)
{
	return !gpio_get_value(palm_usb_state);
}
static char *palm27x_supplicants[] = {
	"main-battery",
};

static struct pda_power_pdata palm27x_ps_info = {
	.init			= palm27x_power_supply_init,
	.exit			= palm27x_power_supply_exit,
	.is_ac_online		= palm27x_is_ac_online,
	.is_usb_online		= palm27x_is_usb_online,
	.supplied_to		= palm27x_supplicants,
	.num_supplicants	= ARRAY_SIZE(palm27x_supplicants),
};

static struct platform_device palm27x_power_supply = {
	.name = "pda-power",
	.id   = -1,
	.dev  = {
		.platform_data = &palm27x_ps_info,
	},
};

void __init palm27x_power_init(int ac, int usb)
{
	palm_ac_state	= ac;
	palm_usb_state	= usb;
	platform_device_register(&palm27x_power_supply);
}
#endif

/******************************************************************************
 * Core power regulator
 ******************************************************************************/
#if defined(CONFIG_REGULATOR_MAX1586) || \
    defined(CONFIG_REGULATOR_MAX1586_MODULE)
static struct regulator_consumer_supply palm27x_max1587a_consumers[] = {
	{
		.supply	= "vcc_core",
	}
};

static struct regulator_init_data palm27x_max1587a_v3_info = {
	.constraints = {
		.name		= "vcc_core range",
		.min_uV		= 900000,
		.max_uV		= 1705000,
		.always_on	= 1,
		.valid_ops_mask	= REGULATOR_CHANGE_VOLTAGE,
	},
	.consumer_supplies	= palm27x_max1587a_consumers,
	.num_consumer_supplies	= ARRAY_SIZE(palm27x_max1587a_consumers),
};

static struct max1586_subdev_data palm27x_max1587a_subdevs[] = {
	{
		.name		= "vcc_core",
		.id		= MAX1586_V3,
		.platform_data	= &palm27x_max1587a_v3_info,
	}
};

static struct max1586_platform_data palm27x_max1587a_info = {
	.subdevs     = palm27x_max1587a_subdevs,
	.num_subdevs = ARRAY_SIZE(palm27x_max1587a_subdevs),
	.v3_gain     = MAX1586_GAIN_R24_3k32, /* 730..1550 mV */
};

static struct i2c_board_info __initdata palm27x_pi2c_board_info[] = {
	{
		I2C_BOARD_INFO("max1586", 0x14),
		.platform_data	= &palm27x_max1587a_info,
	},
};

static struct i2c_pxa_platform_data palm27x_i2c_power_info = {
	.use_pio	= 1,
};

void __init palm27x_pmic_init(void)
{
	i2c_register_board_info(1, ARRAY_AND_SIZE(palm27x_pi2c_board_info));
	pxa27x_set_i2c_power_info(&palm27x_i2c_power_info);
}
#endif