litmus-rt.git/kernel, branch wip-extra-debug

DBG: add additional tracing

2011-01-05T13:13:18+00:00

This is not meant to be merged into master...

DBG: enable additional debug tracing

2011-01-05T13:13:18+00:00

Workaround: do not set rq->skip_clock_update

2010-11-11T23:00:33+00:00

Disabling the clock update seems to be causing problems even in normal
Linux, and causes major bugs under LITMUS^RT. As a workaround, just
disable this "optimization" for now.

Details: the idle load balancer causes tasks that suspsend to be
marked with set_tsk_need_resched(). When such a task resumes, it may
wrongly trigger the setting of skip_clock_update. However, a
corresponding rescheduling event may not happen immediately, such that
the currently-scheduled task is no longer charged for its execution
time.

Implement proper remote preemption support

2010-11-11T22:57:44+00:00

To date, Litmus has just hooked into the smp_send_reschedule() IPI
handler and marked tasks as having to reschedule to implement remote
preemptions. This was never particularly clean, but so far we got away
with it. However, changes in the underlying Linux, and peculartities
of the ARM code (interrupts enabled before context switch) break this
naive approach. This patch introduces new state-machine based remote
preemption support. By examining the local state before calling
set_tsk_need_resched(), we avoid confusing the underlying Linux
scheduler. Further, this patch avoids sending unncessary IPIs.

hook litmus tick function into hrtimer-driven ticks

2010-11-11T22:57:43+00:00

Litmus plugins should also be activated if ticks are triggered by
hrtimer.

Merge commit 'v2.6.36' into wip-merge-2.6.36

2010-10-23T05:01:49+00:00

Conflicts:
	Makefile
	arch/x86/include/asm/unistd_32.h
	arch/x86/kernel/syscall_table_32.S
	kernel/sched.c
	kernel/time/tick-sched.c

Relevant API and functions changes (solved in this commit):
- (API) .enqueue_task() (enqueue_task_litmus),
  dequeue_task() (dequeue_task_litmus),
  [litmus/sched_litmus.c]
- (API) .select_task_rq() (select_task_rq_litmus)
  [litmus/sched_litmus.c]
- (API) sysrq_dump_trace_buffer() and sysrq_handle_kill_rt_tasks()
  [litmus/sched_trace.c]
- struct kfifo internal buffer name changed (buffer -> buf)
  [litmus/sched_trace.c]
- add_wait_queue_exclusive_locked -> __add_wait_queue_tail_exclusive
  [litmus/fmlp.c]
- syscall numbers for both x86_32 and x86_64

hrtimer: add init function to properly set hrtimer_start_on_info params

2010-10-19T13:40:38+00:00

This helper function is also useful to remind us that if we use
hrtimer_pull outside the scope of triggering remote releases, we need to
take care of properly set the "state" field of hrtimer_start_on_info
structure.

sysctl: min/max bounds are optional

2010-10-15T21:42:24+00:00

sysctl check complains with a WARN() when proc_doulongvec_minmax() or
proc_doulongvec_ms_jiffies_minmax() are used by a vector of longs (with
more than one element), with no min or max value specified.

This is unexpected, given we had a bug on this min/max handling :)

Reported-by: Jiri Slaby 
Signed-off-by: Eric Dumazet 
Cc: "Eric W. Biederman" 
Cc: David Miller 
Acked-by: WANG Cong 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

hrtimer: Preserve timer state in remove_hrtimer()

2010-10-14T11:29:59+00:00

The race is described as follows:

CPU X                                 CPU Y
remove_hrtimer
// state & QUEUED == 0
timer->state = CALLBACK
unlock timer base
timer->f(n) //very long
                                  hrtimer_start
                                    lock timer base
                                    remove_hrtimer // no effect
                                    hrtimer_enqueue
                                    timer->state = CALLBACK |
                                                   QUEUED
                                    unlock timer base
                                  hrtimer_start
                                    lock timer base
                                    remove_hrtimer
                                        mode = INACTIVE
                                        // CALLBACK bit lost!
                                    switch_hrtimer_base
                                            CALLBACK bit not set:
                                                    timer->base
                                                    changes to a
                                                    different CPU.
lock this CPU's timer base

The bug was introduced with commit ca109491f (hrtimer: removing all ur
callback modes) in 2.6.29

[ tglx: Feed new state via local variable and add a comment. ]

Signed-off-by: Salman Qazi 
Cc: akpm@linux-foundation.org
Cc: Peter Zijlstra 
LKML-Reference: <20101012142351.8485.21823.stgit@dungbeetle.mtv.corp.google.com>
Signed-off-by: Thomas Gleixner 
Cc: stable@kernel.org

ring-buffer: Fix typo of time extends per page

2010-10-12T16:06:43+00:00

Time stamps for the ring buffer are created by the difference between
two events. Each page of the ring buffer holds a full 64 bit timestamp.
Each event has a 27 bit delta stamp from the last event. The unit of time
is nanoseconds, so 27 bits can hold ~134 milliseconds. If two events
happen more than 134 milliseconds apart, a time extend is inserted
to add more bits for the delta. The time extend has 59 bits, which
is good for ~18 years.

Currently the time extend is committed separately from the event.
If an event is discarded before it is committed, due to filtering,
the time extend still exists. If all events are being filtered, then
after ~134 milliseconds a new time extend will be added to the buffer.

This can only happen till the end of the page. Since each page holds
a full timestamp, there is no reason to add a time extend to the
beginning of a page. Time extends can only fill a page that has actual
data at the beginning, so there is no fear that time extends will fill
more than a page without any data.

When reading an event, a loop is made to skip over time extends
since they are only used to maintain the time stamp and are never
given to the caller. As a paranoid check to prevent the loop running
forever, with the knowledge that time extends may only fill a page,
a check is made that tests the iteration of the loop, and if the
iteration is more than the number of time extends that can fit in a page
a warning is printed and the ring buffer is disabled (all of ftrace
is also disabled with it).

There is another event type that is called a TIMESTAMP which can
hold 64 bits of data in the theoretical case that two events happen
18 years apart. This code has not been implemented, but the name
of this event exists, as well as the structure for it. The
size of a TIMESTAMP is 16 bytes, where as a time extend is only
8 bytes. The macro used to calculate how many time extends can fit on
a page used the TIMESTAMP size instead of the time extend size
cutting the amount in half.

The following test case can easily trigger the warning since we only
need to have half the page filled with time extends to trigger the
warning:

 # cd /sys/kernel/debug/tracing/
 # echo function > current_tracer
 # echo 'common_pid < 0' > events/ftrace/function/filter
 # echo > trace
 # echo 1 > trace_marker
 # sleep 120
 # cat trace

Enabling the function tracer and then setting the filter to only trace
functions where the process id is negative (no events), then clearing
the trace buffer to ensure that we have nothing in the buffer,
then write to trace_marker to add an event to the beginning of a page,
sleep for 2 minutes (only 35 seconds is probably needed, but this
guarantees the bug), and then finally reading the trace which will
trigger the bug.

This patch fixes the typo and prevents the false positive of that warning.

Reported-by: Hans J. Koch 
Tested-by: Hans J. Koch 
Cc: Thomas Gleixner 
Cc: Stable Kernel 
Signed-off-by: Steven Rostedt