aboutsummaryrefslogtreecommitdiffstats
path: root/kernel
Commit message (Collapse)AuthorAge
* perf: Do the big rename: Performance Counters -> Performance EventsIngo Molnar2009-09-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Bye-bye Performance Counters, welcome Performance Events! In the past few months the perfcounters subsystem has grown out its initial role of counting hardware events, and has become (and is becoming) a much broader generic event enumeration, reporting, logging, monitoring, analysis facility. Naming its core object 'perf_counter' and naming the subsystem 'perfcounters' has become more and more of a misnomer. With pending code like hw-breakpoints support the 'counter' name is less and less appropriate. All in one, we've decided to rename the subsystem to 'performance events' and to propagate this rename through all fields, variables and API names. (in an ABI compatible fashion) The word 'event' is also a bit shorter than 'counter' - which makes it slightly more convenient to write/handle as well. Thanks goes to Stephane Eranian who first observed this misnomer and suggested a rename. User-space tooling and ABI compatibility is not affected - this patch should be function-invariant. (Also, defconfigs were not touched to keep the size down.) This patch has been generated via the following script: FILES=$(find * -type f | grep -vE 'oprofile|[^K]config') sed -i \ -e 's/PERF_EVENT_/PERF_RECORD_/g' \ -e 's/PERF_COUNTER/PERF_EVENT/g' \ -e 's/perf_counter/perf_event/g' \ -e 's/nb_counters/nb_events/g' \ -e 's/swcounter/swevent/g' \ -e 's/tpcounter_event/tp_event/g' \ $FILES for N in $(find . -name perf_counter.[ch]); do M=$(echo $N | sed 's/perf_counter/perf_event/g') mv $N $M done FILES=$(find . -name perf_event.*) sed -i \ -e 's/COUNTER_MASK/REG_MASK/g' \ -e 's/COUNTER/EVENT/g' \ -e 's/\<event\>/event_id/g' \ -e 's/counter/event/g' \ -e 's/Counter/Event/g' \ $FILES ... to keep it as correct as possible. This script can also be used by anyone who has pending perfcounters patches - it converts a Linux kernel tree over to the new naming. We tried to time this change to the point in time where the amount of pending patches is the smallest: the end of the merge window. Namespace clashes were fixed up in a preparatory patch - and some stylistic fallout will be fixed up in a subsequent patch. ( NOTE: 'counters' are still the proper terminology when we deal with hardware registers - and these sed scripts are a bit over-eager in renaming them. I've undone some of that, but in case there's something left where 'counter' would be better than 'event' we can undo that on an individual basis instead of touching an otherwise nicely automated patch. ) Suggested-by: Stephane Eranian <eranian@google.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Paul Mackerras <paulus@samba.org> Reviewed-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Howells <dhowells@redhat.com> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <linux-arch@vger.kernel.org> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf_counter: Rename 'event' to event_id/hw_eventIngo Molnar2009-09-21
| | | | | | | | | | In preparation to the renames, to avoid a namespace clash. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf_counter: Rename list_entry -> group_entry, counter_list -> group_listIngo Molnar2009-09-21
| | | | | | | | | | | | | | | | | | This is in preparation of the big rename, but also makes sense in a standalone way: 'list_entry' is a bad name as we already have a list_entry() in list.h. Also, the 'counter list' is too vague, it doesnt tell us the purpose of that list. Clarify these names to show that it's all about the group hiearchy. Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/jaswinder/linux-2.6Linus Torvalds2009-09-20
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/jaswinder/linux-2.6: includecheck fix: x86, cpu/common.c includecheck fix: kernel/trace, ring_buffer.c includecheck fix: include/linux, ftrace.h includecheck fix: include/linux, page_cgroup.h includecheck fix: include/linux, aio.h includecheck fix: include/drm, drm_memory.h includecheck fix: include/acpi, acpi_bus.h includecheck fix: drivers/xen, evtchn.c includecheck fix: drivers/video, vgacon.c includecheck fix: drivers/scsi, ibmvscsi.c includecheck fix: drivers/scsi, libfcoe.c includecheck fix: x86, shadow.c includecheck fix: x86, traps.c includecheck fix: um, helper.c includecheck fix: s390, sys_s390.c
| * includecheck fix: kernel/trace, ring_buffer.cJaswinder Singh Rajput2009-09-20
| | | | | | | | | | | | | | | | | | | | | | | | fix the following 'make includecheck' warning: kernel/trace/ring_buffer.c: trace.h is included more than once. Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Sam Ravnborg <sam@ravnborg.org> LKML-Reference: <1247068617.4382.107.camel@ht.satnam>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6Linus Torvalds2009-09-20
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6: (79 commits) USB serial: update the console driver usb-serial: straighten out serial_open usb-serial: add missing tests and debug lines usb-serial: rename subroutines usb-serial: fix termios initialization logic usb-serial: acquire references when a new tty is installed usb-serial: change logic of serial lookups usb-serial: put subroutines in logical order usb-serial: change referencing of port and serial structures tty: Char: mxser, use THRE for ASPP_OQUEUE ioctl tty: Char: mxser, add support for CP112UL uartlite: support shared interrupt lines tty: USB: serial/mct_u232, fix tty refcnt tty: riscom8, fix tty refcnt tty: riscom8, fix shutdown declaration TTY: fix typos tty: Power: fix suspend vt regression tty: vt: use printk_once tty: handle VT specific compat ioctls in vt driver n_tty: move echoctl check and clean up logic ...
| * | vt: remove power stuff from kernel/powerAlan Cox2009-09-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | In the past someone gratuitiously borrowed chunks of kernel internal vt code and dumped them in kernel/power. They have all sorts of deep relations with the vt code so put them in the vt tree instead Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * | kfifo: Use "const" definitionsAlan Cox2009-09-19
| |/ | | | | | | | | | | | | | | | | Currently kfifo cannot be used by parts of the kernel that use "const" properly as kfifo itself does not use const for passed data blocks which are indeed const. Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* | Merge branch 'perfcounters-core-for-linus' of ↵Linus Torvalds2009-09-20
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perfcounters-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (58 commits) perf_counter: Fix perf_copy_attr() pointer arithmetic perf utils: Use a define for the maximum length of a trace event perf: Add timechart help text and add timechart to "perf help" tracing, x86, cpuidle: Move the end point of a C state in the power tracer perf utils: Be consistent about minimum text size in the svghelper perf timechart: Add "perf timechart record" perf: Add the timechart tool perf: Add a SVG helper library file tracing, perf: Convert the power tracer into an event tracer perf: Add a sample_event type to the event_union perf: Allow perf utilities to have "callback" options without arguments perf: Store trace event name/id pairs in perf.data perf: Add a timestamp to fork events sched_clock: Make it NMI safe perf_counter: Fix up swcounter throttling x86, perf_counter, bts: Optimize BTS overflow handling perf sched: Add --input=file option to builtin-sched.c perf trace: Sample timestamp and cpu when using record flag perf tools: Increase MAX_EVENT_LENGTH perf tools: Fix memory leak in read_ftrace_printk() ...
| * perf_counter: Fix perf_copy_attr() pointer arithmeticIan Schram2009-09-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is still some weird code in per_copy_attr(). Which supposedly checks that all bytes trailing a struct are zero. It doesn't seem to get pointer arithmetic right. Since it increments an iterating pointer by sizeof(unsigned long) rather than 1. Signed-off-by: Ian Schram <ischram@telenet.be> [ v2: clean up the messy PTR_ALIGN logic as well. ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: <stable@kernel.org> # for v2.6.31.x LKML-Reference: <4AB3DEE2.3030600@telenet.be> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * tracing, perf: Convert the power tracer into an event tracerArjan van de Ven2009-09-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch converts the existing power tracer into an event tracer, so that power events (C states and frequency changes) can be tracked via "perf". This also removes the perl script that was used to demo the tracer; its functionality is being replaced entirely with timechart. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20090912130542.6d314860@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * perf: Add a timestamp to fork eventsArjan van de Ven2009-09-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | perf timechart needs to know when a process forked, in order to be able to visualize properly when tasks start. This patch adds a time field to the event structure, and fills it in appropriately. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20090912130341.51ad2de2@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * Merge branch 'linus' into perfcounters/coreIngo Molnar2009-09-19
| |\ | | | | | | | | | | | | | | | Merge reason: Bring in tracing changes we depend on. Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched_clock: Make it NMI safePeter Zijlstra2009-09-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Arjan complained about the suckyness of TSC on modern machines, and asked if we could do something about that for PERF_SAMPLE_TIME. Make cpu_clock() NMI safe by removing the spinlock and using cmpxchg. This also makes it smaller and more robust. Affects architectures that use HAVE_UNSTABLE_SCHED_CLOCK, i.e. IA64 and x86. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | perf_counter: Fix up swcounter throttlingPeter Zijlstra2009-09-18
| | | | | | | | | | | | | | | | | | | | | | | | /me dons the brown paper bag. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | x86, perf_counter, bts: Optimize BTS overflow handlingMarkus Metzger2009-09-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Draining the BTS buffer on a buffer overflow interrupt takes too long resulting in a kernel lockup when tracing the kernel. Restructure perf_counter sampling into sample creation and sample output. Prepare a single reference sample for BTS sampling and update the from and to address fields when draining the BTS buffer. Drain the entire BTS buffer between a single perf_output_begin() / perf_output_end() pair. Signed-off-by: Markus Metzger <markus.t.metzger@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090915130023.A16204@sedona.ch.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | perf_counter: Allow for a wakeup watermarkPeter Zijlstra2009-09-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we wake the mmap() consumer once every PAGE_SIZE of data and/or once event wakeup_events when specified. For high speed sampling this results in too many wakeups wrt. the buffer size, hence change this. We move the default wakeup limit to 1/4-th the buffer size, and provide for means to manually specify this limit. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | perf_counter: Do not throttle single swcounter eventsPeter Zijlstra2009-09-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can have swcounter events that contribute more than a single count per event, when used with a non-zero period, those can generate multiple events, which is when we need throttling. However, swcounter that contribute only a single count per event can only come as fast as we can run code, hence don't throttle them. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | perf_counter, sched: Add sched_stat_runtime tracepointIngo Molnar2009-09-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This allows more precise tracking of how the scheduler accounts (and acts upon) a task having spent N nanoseconds of CPU time. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | perf_counter: Allow mmap if paranoid checks are turned offIngo Molnar2009-09-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before: $ perf sched record -f sleep 1 Error: failed to mmap with 1 (Operation not permitted) After: $ perf sched record -f sleep 1 [ perf record: Captured and wrote 0.095 MB perf.data (~4161 samples) ] Note, this is only allowed if perfcounter_paranoid is set to the most permissive (non-default) value of -1. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | headers: taskstats_kern.h trimAlexey Dobriyan2009-09-18
| | | | | | | | | | | | | | | | | | | | | | | | Remove net/genetlink.h inclusion, now sched.c won't be recompiled because of some networking changes. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Merge branch 'timers-for-linus' of ↵Linus Torvalds2009-09-18
|\ \ \ | |_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (34 commits) time: Prevent 32 bit overflow with set_normalized_timespec() clocksource: Delay clocksource down rating to late boot clocksource: clocksource_select must be called with mutex locked clocksource: Resolve cpu hotplug dead lock with TSC unstable, fix crash timers: Drop a function prototype clocksource: Resolve cpu hotplug dead lock with TSC unstable timer.c: Fix S/390 comments timekeeping: Fix invalid getboottime() value timekeeping: Fix up read_persistent_clock() breakage on sh timekeeping: Increase granularity of read_persistent_clock(), build fix time: Introduce CLOCK_REALTIME_COARSE x86: Do not unregister PIT clocksource on PIT oneshot setup/shutdown clocksource: Avoid clocksource watchdog circular locking dependency clocksource: Protect the watchdog rating changes with clocksource_mutex clocksource: Call clocksource_change_rating() outside of watchdog_lock timekeeping: Introduce read_boot_clock timekeeping: Increase granularity of read_persistent_clock() timekeeping: Update clocksource with stop_machine timekeeping: Add timekeeper read_clock helper functions timekeeping: Move NTP adjusted clock multiplier to struct timekeeper ... Fix trivial conflict due to MIPS lemote -> loongson renaming.
| * | time: Prevent 32 bit overflow with set_normalized_timespec()Thomas Gleixner2009-09-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | set_normalized_timespec() nsec argument is of type long. The recent timekeeping changes of ktime_get_ts() feed ts->tv_nsec + tomono.tv_nsec + nsecs to set_normalized_timespec(). On 32 bit machines that sum can be larger than (1 << 31) and therefor result in a negative value which screws up the result completely. Make the nsec argument of set_normalized_timespec() s64 to fix the problem at hand. This also prevents similar problems for future users of set_normalized_timespec(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Carsten Emde <carsten.emde@osadl.org> LKML-Reference: <new-submission> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: John Stultz <johnstul@us.ibm.com>
| * | clocksource: Delay clocksource down rating to late bootThomas Gleixner2009-09-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The down rating of clock sources in the early boot process via the clock source watchdog mechanism can happen way before the per cpu event queues are initialized. This leads to a boot crash on x86 when the TSC is marked unstable in the SMP bring up. The selection of a clock source for time keeping happens in the late boot process so we can safely delay the list manipulation until clocksource_done_booting() is called. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <new-submission> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | clocksource: clocksource_select must be called with mutex lockedThomas Gleixner2009-09-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | The callers of clocksource_select must hold clocksource_mutex to protect the clocksource_list. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <new-submission> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | clocksource: Resolve cpu hotplug dead lock with TSC unstable, fix crashMartin Schwidefsky2009-09-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The watchdog timer is started after the watchdog clocksource and at least one watched clocksource have been registered. The clocksource work element watchdog_work is initialized just before the clocksource timer is started. This is too late for the clocksource_mark_unstable call from native_cpu_up. To fix this use a static initializer for watchdog_work. This resolves a boot crash reported by multiple people. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: John Stultz <johnstul@us.ibm.com> LKML-Reference: <20090911153305.3fe9a361@skybase> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | clocksource: Resolve cpu hotplug dead lock with TSC unstableThomas Gleixner2009-08-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Martin Schwidefsky analyzed it: To register a clocksource the clocksource_mutex is acquired and if necessary timekeeping_notify is called to install the clocksource as the timekeeper clock. timekeeping_notify uses stop_machine which needs to take cpu_add_remove_lock mutex. Starting a new cpu is done with the cpu_add_remove_lock mutex held. native_cpu_up checks the tsc of the new cpu and if the tsc is no good clocksource_change_rating is called. Which needs the clocksource_mutex and the deadlock is complete. The solution is to replace the TSC via the clocksource watchdog mechanism. Mark the TSC as unstable and schedule the watchdog work so it gets removed in the watchdog thread context. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <new-submission> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: John Stultz <johnstul@us.ibm.com>
| * | timer.c: Fix S/390 commentsRandy Dunlap2009-08-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix typos and add omitted words. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: akpm <akpm@linux-foundation.org> Cc: linux390@de.ibm.com Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> LKML-Reference: <20090825143541.43fc2ed8.randy.dunlap@oracle.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | timekeeping: Fix invalid getboottime() valueHiroshi Shimamoto2009-08-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Don't use timespec_add_safe() with wall_to_monotonic, because wall_to_monotonic has negative values which will cause overflow in timespec_add_safe(). That makes btime in /proc/stat invalid. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Daniel Walker <dwalker@fifo99.com> LKML-Reference: <4A937FDE.4050506@ct.jp.nec.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | time: Introduce CLOCK_REALTIME_COARSEjohn stultz2009-08-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After talking with some application writers who want very fast, but not fine-grained timestamps, I decided to try to implement new clock_ids to clock_gettime(): CLOCK_REALTIME_COARSE and CLOCK_MONOTONIC_COARSE which returns the time at the last tick. This is very fast as we don't have to access any hardware (which can be very painful if you're using something like the acpi_pm clocksource), and we can even use the vdso clock_gettime() method to avoid the syscall. The only trade off is you only get low-res tick grained time resolution. This isn't a new idea, I know Ingo has a patch in the -rt tree that made the vsyscall gettimeofday() return coarse grained time when the vsyscall64 sysctrl was set to 2. However this affects all applications on a system. With this method, applications can choose the proper speed/granularity trade-off for themselves. Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: nikolag@ca.ibm.com Cc: Darren Hart <dvhltc@us.ibm.com> Cc: arjan@infradead.org Cc: jonathan@jonmasters.org LKML-Reference: <1250734414.6897.5.camel@localhost.localdomain> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | clocksource: Avoid clocksource watchdog circular locking dependencyMartin Schwidefsky2009-08-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | stop_machine from a multithreaded workqueue is not allowed because of a circular locking dependency between cpu_down and the workqueue execution. Use a kernel thread to do the clocksource downgrade. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: john stultz <johnstul@us.ibm.com> LKML-Reference: <20090818170942.3ab80c91@skybase> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | clocksource: Protect the watchdog rating changes with clocksource_mutexThomas Gleixner2009-08-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Martin pointed out that commit 6ea41d2529 (clocksource: Call clocksource_change_rating() outside of watchdog_lock) has a theoretical reference count problem. The calls to clocksource_change_rating() are now done outside of the clocksource mutex and outside of the watchdog lock. A concurrent clocksource_unregister() could remove the clock. Split out the code which changes the rating from clocksource_change_rating() into __clocksource_change_rating(). Protect the clocksource_watchdog_work() code sequence with the clocksource_mutex() and call __clocksource_change_rating(). LKML-Reference: <alpine.LFD.2.00.0908171038420.2782@localhost.localdomain> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | clocksource: Call clocksource_change_rating() outside of watchdog_lockThomas Gleixner2009-08-15
| | | | | | | | | | | | | | | | | | | | | | | | The changes to the watchdog logic introduced a lock inversion between watchdog_lock and clocksource_mutex. Change the rating outside of watchdog_lock to avoid it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | timekeeping: Introduce read_boot_clockMartin Schwidefsky2009-08-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the new function read_boot_clock to get the exact time the system has been started. For architectures without support for exact boot time a new weak function is added that returns 0. Use the exact boot time to initialize wall_to_monotonic, or xtime if the read_boot_clock returned 0. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Daniel Walker <dwalker@fifo99.com> LKML-Reference: <20090814134811.296703241@de.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | timekeeping: Increase granularity of read_persistent_clock()Martin Schwidefsky2009-08-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The persistent clock of some architectures (e.g. s390) have a better granularity than seconds. To reduce the delta between the host clock and the guest clock in a virtualized system change the read_persistent_clock function to return a struct timespec. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Daniel Walker <dwalker@fifo99.com> LKML-Reference: <20090814134811.013873340@de.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | timekeeping: Update clocksource with stop_machineMartin Schwidefsky2009-08-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | update_wall_time calls change_clocksource HZ times per second to check if a new clock source is available. In close to 100% of all calls there is no new clock. Replace the tick based check by an update done with stop_machine. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Daniel Walker <dwalker@fifo99.com> LKML-Reference: <20090814134810.711836357@de.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | timekeeping: Add timekeeper read_clock helper functionsMartin Schwidefsky2009-08-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add timekeeper_read_clock_ntp and timekeeper_read_clock_raw and use them for getnstimeofday, ktime_get, ktime_get_ts and getrawmonotonic. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Daniel Walker <dwalker@fifo99.com> LKML-Reference: <20090814134810.435105711@de.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | timekeeping: Move NTP adjusted clock multiplier to struct timekeeperMartin Schwidefsky2009-08-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The clocksource structure has two multipliers, the unmodified multiplier clock->mult_orig and the NTP corrected multiplier clock->mult. The NTP multiplier is misplaced in the struct clocksource, this is private information of the timekeeping code. Add the mult field to the struct timekeeper to contain the NTP corrected value, keep the unmodifed multiplier in clock->mult and remove clock->mult_orig. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Daniel Walker <dwalker@fifo99.com> LKML-Reference: <20090814134810.149047645@de.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | timekeeping: Add xtime_shift and ntp_error_shift to struct timekeeperMartin Schwidefsky2009-08-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The xtime_nsec value in the timekeeper structure is shifted by a few bits to improve precision. This happens to be the same value as the clock->shift. To improve readability add xtime_shift to the timekeeper and use it instead of the clock->shift. Likewise add ntp_error_shift and replace all (NTP_SCALE_SHIFT - clock->shift) expressions. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Daniel Walker <dwalker@fifo99.com> LKML-Reference: <20090814134809.871899606@de.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | timekeeping: Introduce struct timekeeperMartin Schwidefsky2009-08-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add struct timekeeper to keep the internal values timekeeping.c needs in regard to the currently selected clock source. This moves the timekeeping intervals, xtime_nsec and the ntp error value from struct clocksource to struct timekeeper. The raw_time is removed from the clocksource as well. It gets treated like xtime as a global variable. Eventually xtime raw_time should be moved to struct timekeeper. [ tglx: minor cleanup ] Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Daniel Walker <dwalker@fifo99.com> LKML-Reference: <20090814134809.613209842@de.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | clocksource: Move watchdog downgrade to a work queue threadMartin Schwidefsky2009-08-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the downgrade of an unstable clocksource from the timer interrupt context into the process context of a work queue thread. This is needed to be able to do the clocksource switch with stop_machine. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Daniel Walker <dwalker@fifo99.com> LKML-Reference: <20090814134809.354926067@de.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | clocksource: Refactor clocksource watchdogMartin Schwidefsky2009-08-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Refactor clocksource watchdog code to make it more readable. Add clocksource_dequeue_watchdog to remove a clocksource from the watchdog list when it is unregistered. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Daniel Walker <dwalker@fifo99.com> LKML-Reference: <20090814134809.110881699@de.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | clocksource: Simplify clocksource watchdog resume logicMartin Schwidefsky2009-08-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To resume the clocksource watchdog just remove the CLOCK_SOURCE_WATCHDOG bit from the watched clocksource. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Daniel Walker <dwalker@fifo99.com> LKML-Reference: <20090814134808.880925790@de.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | clocksource: Delay clocksource watchdog highres enablementMartin Schwidefsky2009-08-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The clocksource watchdog marks a clock as highres capable before it checked the deviation from the watchdog clocksource even for a single time. Make sure that the deviation is at least checked once before doing the switch to highres mode. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Daniel Walker <dwalker@fifo99.com> LKML-Reference: <20090814134808.627795883@de.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | clocksource: Cleanup clocksource selectionMartin Schwidefsky2009-08-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a non high-resolution clocksource is first set as override clock and then registered it becomes active even if the system is in one-shot mode. Move the override check from sysfs_override_clocksource to the clocksource selection. That fixes the bug and simplifies the code. The check in clocksource_register for double registration of the same clocksource is removed without replacement. To find the initial clocksource a new weak function in jiffies.c is defined that returns the jiffies clocksource. The architecture code can then override the weak function with a more suitable clocksource, e.g. the TOD clock on s390. [ tglx: Folded in a fix from John Stultz ] Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Daniel Walker <dwalker@fifo99.com> LKML-Reference: <20090814134808.388024160@de.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | timekeeping: Move reset of cycle_last for tsc clocksource to tscMartin Schwidefsky2009-08-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | change_clocksource resets the cycle_last value to zero then sets it to a value read from the clocksource. The reset to zero is required only for the TSC clocksource to make the read_tsc function work after a resume. The reason is that the TSC read function uses cycle_last to detect backwards going TSCs. In the resume case cycle_last contains the TSC value from the last update before the suspend. On resume the TSC starts counting from 0 again and would trip over the cycle_last comparison. This is subtle and surprising. Move the reset to a resume function in the tsc code. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Daniel Walker <dwalker@fifo99.com> LKML-Reference: <20090814134808.142191175@de.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | timekeeping: Remove clocksource inline functionsMartin Schwidefsky2009-08-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The three inline functions clocksource_read, clocksource_enable and clocksource_disable are simple wrappers of an indirect call plus the copy from and to the mult_orig value. The functions are exclusively used by the timekeeping code which has intimate knowledge of the clocksource anyway. Therefore remove the inline functions. No functional change. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Daniel Walker <dwalker@fifo99.com> LKML-Reference: <20090814134807.903108946@de.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | timekeeping: Introduce timekeeping_leap_insertJohn Stultz2009-08-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the adjustment of xtime, wall_to_monotonic and the update of the vsyscall variables to the timekeeping code. Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> LKML-Reference: <20090814134807.609730216@de.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | Merge branch 'linus' into timers/coreThomas Gleixner2009-08-14
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | Reason: Martin's timekeeping cleanup series depends on both timers/core and mainline changes. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | timers: Cache __next_timer_interrupt resultMartin Schwidefsky2009-08-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Each time a cpu goes to sleep on a NOHZ=y system the timer wheel is searched for the next timer interrupt. It can take quite a few cycles to find the next pending timer. This patch adds a field to tvec_base that caches the result of __next_timer_interrupt. The hit ratio is around 80% on my thinkpad under normal use, on a server I've seen hit ratios from 5% to 95% dependent on the workload. -v2: jiffies wrap fixes Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Venki Pallipadi <venkatesh.pallipadi@intel.com> LKML-Reference: <20090721202505.7d56a079@skybase> Signed-off-by: Ingo Molnar <mingo@elte.hu>