Add LITMUS^RT core implementation

This patch adds the core of LITMUS^RT: - library functionality (heaps, rt_domain, prioritization, etc.) - budget enforcement logic - job management - system call backends - virtual devices (control page, etc.) - scheduler plugin API (and dummy plugin) This code compiles, but is not yet integrated with the rest of Linux. Squashed changes: LITMUS^RT Core: add get_current_budget() system call Allow userspace to figure out the used-up and remaining budget of a task. Adds deadline field to control page and updates it when setting up jobs for release. Adds control page deadline offset ftdev: respect O_NONBLOCK flag in ftdev_read() Don't block if userspace wants to go on doing something else. Export job release time and job sequence number in ctrl page Add alternate complete_job() default implementation Let jobs sleep like regular Linux tasks by suspending and waking them with a one-shot timer. Plugins can opt into using this implementation instead of the classic complete_job() implementation (or custom implementations). Fix RCU locking in sys_get_rt_task_param() sys_get_rt_task_param() is rarely used and apparently attracted some bitrot. Free before setting NULL to prevent memory leak Add hrtimer_start_on() support This patch replaces the previous implementation of hrtimer_start_on() by now using smp_call_function_single_async() to arm hrtimers on remote CPUs. Expose LITMUS^RT system calls via control page ioctl() Rationale: make LITMUS^RT ops available in a way that does not create merge conflicts each time we rebase LITMUS^RT on top of a new kernel version. This also helps with portability to different architectures, as we no longer need to patch each architecture's syscall table. Pick non-zero syscall ID start range To avoid interfering with Linux's magic reserved IOCTL numbers Don't preempt before time check in sleep_until_next_release() Avoid preempting jobs that are about to go to sleep soon anyway. LITMUS^RT proc: fix wrong memset() TRACE(): add TRACE_WARN_ON() helper Useful to replace BUG_ON() and WARN_ON() with a non-fatal TRACE()-based equivalent. Add void* plugin_state pointer to task_struct LITMUS^RT: split task admission into two functions Plugin interface: add fork_task() callback LITMUS^RT: Enable plugins to permit RT tasks to fork one-shot complete_job(): set completed flag This could race with a SIGSTOP or some other forced suspension, but we'll let plugins handle this, should they actually care. FP: add list-based ready queue LITMUS^RT core: add should_wait_for_stack() callback Allow plugins to give up when waiting for a stack to become available. LITMUS^RT core: add next_became_invalid() callback LITMUS^RT core: add post-migration validation callback LITMUS^RT core: be more careful when pull-migrating tasks Close more race windows and give plugins a chance to validate tasks after they have been migrated. Add KConfig options for timer latency warnings Add reservation creation API to plugin interface & syscalls LITMUS^RT syscall: expose sys_reservation_create() via ioctl() Add reservation configuration types to rt_param.h Add basic generic reservation-based scheduling infrastructure Switch to aligned quanta by default. For first-time users, aligned quanta is likely what's expected. LITMUS^RT core: keep track of time of last suspension This information is needed to insert ST_COMPLETION records for sporadic tasks. add fields for clock_nanosleep() support Need to communicate the intended wake-up time to the plugin wake-up handler. LITMUS^RT core: add generic handler for sporadic job arrivals In particular, check if a job arrival is triggered from a clock_nanosleep() call. add litmus->task_change_params() callback to plugin interface Will be used by adaptive C-EDF. Call litmus->task_change_params() from sys_set_rt_task_param() Move trace point definition to litmus/litmus.c If !CONFIG_SCHED_TASK_TRACE, but CONFIG_SCHED_LITMUS_TRACEPOINT, then we still need to define the tracepoint structures. This patch should be integrated with the earlier sched_task_trace.c patches during one of the next major rebasing efforts. LITMUS^RT scheduling class: mark enqueued task as present Remove unistd_*.h rebase fix: update to new hrtimer API The new API is actually nicer and cleaner. rebase fix: call lockdep_unpin_lock(&rq->lock, cookie) The LITMUS^RT scheduling class should also do the LOCKDEP dance. LITMUS^RT core: break out non-preemptive flag defs Not every file including litmus.h needs to know this. LITMUS^RT core: don't include debug_trace.h in litmus.h Including debug_trace.h introduces the TRACE() macro, which causes symbol clashes in some (rather obscure) drivers. LITMUS^RT core: add litmus_preemption_in_progress flags Used to communicate that a preemption is in progress. Set by the scheduler; read by the plugins. LITMUS^RT core: revise is_current_running() macro
author: Bjoern Brandenburg <bbb@mpi-sws.org> 2015-08-09 07:18:48 -0400
committer: Bjoern Brandenburg <bbb@mpi-sws.org> 2017-05-26 17:12:28 -0400
commit: 3baa55c19ffb567aa48568fa69dd17ad6f70d31d (patch)
tree: 7e79fd398705929f2db40ba239895cc60762f61f /include/litmus
parent: cbe61859a233702ed8e6723b3b133d1f2ae1ae2c (diff)
28 files changed, 2403 insertions, 53 deletions
diff --git a/include/litmus/affinity.h b/include/litmus/affinity.h
new file mode 100644
index 000000000000..4d7c618c8175
--- /dev/null
+++ b/include/litmus/affinity.h
@@ -0,0 +1,52 @@
+#ifndef __LITMUS_AFFINITY_H
+#define __LITMUS_AFFINITY_H
+#include <linux/cpumask.h>
+/* Works like:
+void get_nearest_available_cpu(
+        cpu_entry_t **nearest,
+        cpu_entry_t *start,
+        cpu_entry_t *entries,
+        int release_master,
+        cpumask_var_t cpus_to_test)
+Set release_master = NO_CPU for no Release Master.
+We use a macro here to exploit the fact that C-EDF and G-EDF
+have similar structures for their cpu_entry_t structs, even though
+they do not share a common base-struct.  The macro allows us to
+avoid code duplication.
+ */
+#define get_nearest_available_cpu(nearest, start, entries, release_master, cpus_to_test) \
+{ \
+        (nearest) = NULL; \
+        if (!(start)->linked && likely((start)->cpu != (release_master))) { \
+                (nearest) = (start); \
+        } else { \
+                int __cpu; \
+                \
+                /* FIXME: get rid of the iteration with a bitmask + AND */ \
+                for_each_cpu(__cpu, cpus_to_test) { \
+                        if (likely(__cpu != release_master)) { \
+                                cpu_entry_t *__entry = &per_cpu((entries), __cpu); \
+                                if (cpus_share_cache((start)->cpu, __entry->cpu) \
+                                    && !__entry->linked) { \
+                                        (nearest) = __entry; \
+                                        break; \
+                                } \
+                        } \
+                } \
+        } \
+        \
+        if ((nearest)) { \
+                TRACE("P%d is closest available CPU to P%d\n", \
+                                (nearest)->cpu, (start)->cpu); \
+        } else { \
+                TRACE("Could not find an available CPU close to P%d\n", \
+                                (start)->cpu); \
+        } \
+}
+#endif
diff --git a/include/litmus/bheap.h b/include/litmus/bheap.h
new file mode 100644
index 000000000000..cf4864a498d8
--- /dev/null
+++ b/include/litmus/bheap.h
@@ -0,0 +1,77 @@
+/* bheaps.h -- Binomial Heaps
+ *
+ * (c) 2008, 2009 Bjoern Brandenburg
+ */
+#ifndef BHEAP_H
+#define BHEAP_H
+#define NOT_IN_HEAP UINT_MAX
+struct bheap_node {
+        struct bheap_node*      parent;
+        struct bheap_node*      next;
+        struct bheap_node*      child;
+        unsigned int            degree;
+        void*                   value;
+        struct bheap_node**     ref;
+};
+struct bheap {
+        struct bheap_node*      head;
+        /* We cache the minimum of the heap.
+         * This speeds up repeated peek operations.
+         */
+        struct bheap_node*      min;
+};
+typedef int (*bheap_prio_t)(struct bheap_node* a, struct bheap_node* b);
+void bheap_init(struct bheap* heap);
+void bheap_node_init(struct bheap_node** ref_to_bheap_node_ptr, void* value);
+static inline int bheap_node_in_heap(struct bheap_node* h)
+{
+        return h->degree != NOT_IN_HEAP;
+}
+static inline int bheap_empty(struct bheap* heap)
+{
+        return heap->head == NULL && heap->min == NULL;
+}
+/* insert (and reinitialize) a node into the heap */
+void bheap_insert(bheap_prio_t higher_prio,
+                 struct bheap* heap,
+                 struct bheap_node* node);
+/* merge addition into target */
+void bheap_union(bheap_prio_t higher_prio,
+                struct bheap* target,
+                struct bheap* addition);
+struct bheap_node* bheap_peek(bheap_prio_t higher_prio,
+                            struct bheap* heap);
+struct bheap_node* bheap_take(bheap_prio_t higher_prio,
+                            struct bheap* heap);
+void bheap_uncache_min(bheap_prio_t higher_prio, struct bheap* heap);
+int  bheap_decrease(bheap_prio_t higher_prio, struct bheap_node* node);
+void bheap_delete(bheap_prio_t higher_prio,
+                 struct bheap* heap,
+                 struct bheap_node* node);
+/* allocate from memcache */
+struct bheap_node* bheap_node_alloc(int gfp_flags);
+void bheap_node_free(struct bheap_node* hn);
+/* allocate a heap node for value and insert into the heap */
+int bheap_add(bheap_prio_t higher_prio, struct bheap* heap,
+             void* value, int gfp_flags);
+void* bheap_take_del(bheap_prio_t higher_prio,
+                    struct bheap* heap);
+#endif
diff --git a/include/litmus/binheap.h b/include/litmus/binheap.h
new file mode 100644
index 000000000000..1cf364701da8
--- /dev/null
+++ b/include/litmus/binheap.h
@@ -0,0 +1,205 @@
+#ifndef LITMUS_BINARY_HEAP_H
+#define LITMUS_BINARY_HEAP_H
+#include <linux/kernel.h>
+/**
+ * Simple binary heap with add, arbitrary delete, delete_root, and top
+ * operations.
+ *
+ * Style meant to conform with list.h.
+ *
+ * Motivation: Linux's prio_heap.h is of fixed size. Litmus's binomial
+ * heap may be overkill (and perhaps not general enough) for some applications.
+ *
+ * Note: In order to make node swaps fast, a node inserted with a data pointer
+ * may not always hold said data pointer. This is similar to the binomial heap
+ * implementation. This does make node deletion tricky since we have to
+ * (1) locate the node that holds the data pointer to delete, and (2) the
+ * node that was originally inserted with said data pointer. These have to be
+ * coalesced into a single node before removal (see usage of
+ * __binheap_safe_swap()). We have to track node references to accomplish this.
+ */
+struct binheap_node {
+        void    *data;
+        struct binheap_node *parent;
+        struct binheap_node *left;
+        struct binheap_node *right;
+        /* pointer to binheap_node that holds *data for which this binheap_node
+         * was originally inserted.  (*data "owns" this node)
+         */
+        struct binheap_node *ref;
+        struct binheap_node **ref_ptr;
+};
+/**
+ * Signature of compator function.  Assumed 'less-than' (min-heap).
+ * Pass in 'greater-than' for max-heap.
+ *
+ * TODO: Consider macro-based implementation that allows comparator to be
+ * inlined (similar to Linux red/black tree) for greater efficiency.
+ */
+typedef int (*binheap_order_t)(struct binheap_node *a,
+                                struct binheap_node *b);
+struct binheap {
+        struct binheap_node *root;
+        /* pointer to node to take next inserted child */
+        struct binheap_node *next;
+        /* pointer to last node in complete binary tree */
+        struct binheap_node *last;
+        /* comparator function pointer */
+        binheap_order_t compare;
+};
+/* Initialized heap nodes not in a heap have parent
+ * set to BINHEAP_POISON.
+ */
+#define BINHEAP_POISON  ((void*)(0xdeadbeef))
+/**
+ * binheap_entry - get the struct for this heap node.
+ *  Only valid when called upon heap nodes other than the root handle.
+ * @ptr:        the heap node.
+ * @type:       the type of struct pointed to by binheap_node::data.
+ * @member:     unused.
+ */
+#define binheap_entry(ptr, type, member) \
+((type *)((ptr)->data))
+/**
+ * binheap_node_container - get the struct that contains this node.
+ *  Only valid when called upon heap nodes other than the root handle.
+ * @ptr:        the heap node.
+ * @type:       the type of struct the node is embedded in.
+ * @member:     the name of the binheap_struct within the (type) struct.
+ */
+#define binheap_node_container(ptr, type, member) \
+container_of((ptr), type, member)
+/**
+ * binheap_top_entry - get the struct for the node at the top of the heap.
+ *  Only valid when called upon the heap handle node.
+ * @ptr:    the special heap-handle node.
+ * @type:   the type of the struct the head is embedded in.
+ * @member:     the name of the binheap_struct within the (type) struct.
+ */
+#define binheap_top_entry(ptr, type, member) \
+binheap_entry((ptr)->root, type, member)
+/**
+ * binheap_delete_root - remove the root element from the heap.
+ * @handle:      handle to the heap.
+ * @type:    the type of the struct the head is embedded in.
+ * @member:      the name of the binheap_struct within the (type) struct.
+ */
+#define binheap_delete_root(handle, type, member) \
+__binheap_delete_root((handle), &((type *)((handle)->root->data))->member)
+/**
+ * binheap_delete - remove an arbitrary element from the heap.
+ * @to_delete:  pointer to node to be removed.
+ * @handle:      handle to the heap.
+ */
+#define binheap_delete(to_delete, handle) \
+__binheap_delete((to_delete), (handle))
+/**
+ * binheap_add - insert an element to the heap
+ * new_node: node to add.
+ * @handle:      handle to the heap.
+ * @type:    the type of the struct the head is embedded in.
+ * @member:      the name of the binheap_struct within the (type) struct.
+ */
+#define binheap_add(new_node, handle, type, member) \
+__binheap_add((new_node), (handle), container_of((new_node), type, member))
+/**
+ * binheap_decrease - re-eval the position of a node (based upon its
+ * original data pointer).
+ * @handle: handle to the heap.
+ * @orig_node: node that was associated with the data pointer
+ *             (whose value has changed) when said pointer was
+ *             added to the heap.
+ */
+#define binheap_decrease(orig_node, handle) \
+__binheap_decrease((orig_node), (handle))
+#define BINHEAP_NODE_INIT() { NULL, BINHEAP_POISON, NULL, NULL , NULL, NULL}
+#define BINHEAP_NODE(name) \
+        struct binheap_node name = BINHEAP_NODE_INIT()
+static inline void INIT_BINHEAP_NODE(struct binheap_node *n)
+{
+        n->data = NULL;
+        n->parent = BINHEAP_POISON;
+        n->left = NULL;
+        n->right = NULL;
+        n->ref = NULL;
+        n->ref_ptr = NULL;
+}
+static inline void INIT_BINHEAP_HANDLE(struct binheap *handle,
+                                binheap_order_t compare)
+{
+        handle->root = NULL;
+        handle->next = NULL;
+        handle->last = NULL;
+        handle->compare = compare;
+}
+/* Returns true if binheap is empty. */
+static inline int binheap_empty(struct binheap *handle)
+{
+        return(handle->root == NULL);
+}
+/* Returns true if binheap node is in a heap. */
+static inline int binheap_is_in_heap(struct binheap_node *node)
+{
+        return (node->parent != BINHEAP_POISON);
+}
+/* Returns true if binheap node is in given heap. */
+int binheap_is_in_this_heap(struct binheap_node *node, struct binheap* heap);
+/* Add a node to a heap */
+void __binheap_add(struct binheap_node *new_node,
+                                struct binheap *handle,
+                                void *data);
+/**
+ * Removes the root node from the heap. The node is removed after coalescing
+ * the binheap_node with its original data pointer at the root of the tree.
+ *
+ * The 'last' node in the tree is then swapped up to the root and bubbled
+ * down.
+ */
+void __binheap_delete_root(struct binheap *handle,
+                                struct binheap_node *container);
+/**
+ * Delete an arbitrary node.  Bubble node to delete up to the root,
+ * and then delete to root.
+ */
+void __binheap_delete(struct binheap_node *node_to_delete,
+                                struct binheap *handle);
+/**
+ * Bubble up a node whose pointer has decreased in value.
+ */
+void __binheap_decrease(struct binheap_node *orig_node,
+                                                struct binheap *handle);
+#endif
diff --git a/include/litmus/budget.h b/include/litmus/budget.h
new file mode 100644
index 000000000000..60eb814fc82b
--- /dev/null
+++ b/include/litmus/budget.h
@@ -0,0 +1,38 @@
+#ifndef _LITMUS_BUDGET_H_
+#define _LITMUS_BUDGET_H_
+/* Update the per-processor enforcement timer (arm/reproram/cancel) for
+ * the next task. */
+void update_enforcement_timer(struct task_struct* t);
+inline static int budget_exhausted(struct task_struct* t)
+{
+        return get_exec_time(t) >= get_exec_cost(t);
+}
+inline static lt_t budget_remaining(struct task_struct* t)
+{
+        if (!budget_exhausted(t))
+                return get_exec_cost(t) - get_exec_time(t);
+        else
+                /* avoid overflow */
+                return 0;
+}
+#define budget_enforced(t) (tsk_rt(t)->task_params.budget_policy != NO_ENFORCEMENT)
+#define budget_precisely_enforced(t) (tsk_rt(t)->task_params.budget_policy \
+                                      == PRECISE_ENFORCEMENT)
+static inline int requeue_preempted_job(struct task_struct* t)
+{
+        /* Add task to ready queue only if not subject to budget enforcement or
+         * if the job has budget remaining. t may be NULL.
+         */
+        return t && !is_completed(t) &&
+                (!budget_exhausted(t) || !budget_enforced(t));
+}
+void litmus_current_budget(lt_t *used_so_far, lt_t *remaining);
+#endif
diff --git a/include/litmus/ceiling.h b/include/litmus/ceiling.h
new file mode 100644
index 000000000000..f3d3889315f7
--- /dev/null
+++ b/include/litmus/ceiling.h
@@ -0,0 +1,36 @@
+#ifndef _LITMUS_CEILING_H_
+#define _LITMUS_CEILING_H_
+#ifdef CONFIG_LITMUS_LOCKING
+void __srp_ceiling_block(struct task_struct *cur);
+DECLARE_PER_CPU(int, srp_objects_in_use);
+/* assumes preemptions off */
+void srp_ceiling_block(void)
+{
+        struct task_struct *tsk = current;
+        /* Only applies to real-time tasks. */
+        if (!is_realtime(tsk))
+                return;
+        /* Bail out early if there aren't any SRP resources around. */
+        if (likely(!raw_cpu_read(srp_objects_in_use)))
+                return;
+        /* Avoid recursive ceiling blocking. */
+        if (unlikely(tsk->rt_param.srp_non_recurse))
+                return;
+        /* must take slow path */
+        __srp_ceiling_block(tsk);
+}
+#else
+#define srp_ceiling_block() /* nothing */
+#endif
+#endif
+\ No newline at end of file
diff --git a/include/litmus/clustered.h b/include/litmus/clustered.h
new file mode 100644
index 000000000000..fc7f0f87966e
--- /dev/null
+++ b/include/litmus/clustered.h
@@ -0,0 +1,46 @@
+#ifndef CLUSTERED_H
+#define CLUSTERED_H
+/* Which cache level should be used to group CPUs into clusters?
+ * GLOBAL_CLUSTER means that all CPUs form a single cluster (just like under
+ * global scheduling).
+ */
+enum cache_level {
+        GLOBAL_CLUSTER = 0,
+        L1_CLUSTER     = 1,
+        L2_CLUSTER     = 2,
+        L3_CLUSTER     = 3
+};
+int parse_cache_level(const char *str, enum cache_level *level);
+const char* cache_level_name(enum cache_level level);
+/* expose a cache level in a /proc dir */
+struct proc_dir_entry* create_cluster_file(struct proc_dir_entry* parent,
+                                           enum cache_level* level);
+struct scheduling_cluster {
+        unsigned int id;
+        /* list of CPUs that are part of this cluster */
+        struct list_head cpus;
+};
+struct cluster_cpu {
+        unsigned int id; /* which CPU is this? */
+        struct list_head cluster_list; /* List of the CPUs in this cluster. */
+        struct scheduling_cluster* cluster; /* The cluster that this CPU belongs to. */
+};
+int get_cluster_size(enum cache_level level);
+int assign_cpus_to_clusters(enum cache_level level,
+                            struct scheduling_cluster* clusters[],
+                            unsigned int num_clusters,
+                            struct cluster_cpu* cpus[],
+                            unsigned int num_cpus);
+int get_shared_cpu_map(cpumask_var_t mask, unsigned int cpu, unsigned int index);
+#endif
diff --git a/include/litmus/ctrlpage.h b/include/litmus/ctrlpage.h
new file mode 100644
index 000000000000..f7b03e1aedd6
--- /dev/null
+++ b/include/litmus/ctrlpage.h
@@ -0,0 +1,105 @@
+#ifndef _LITMUS_CTRLPAGE_H_
+#define _LITMUS_CTRLPAGE_H_
+#include <litmus/rt_param.h>
+union np_flag {
+        uint32_t raw;
+        struct {
+                /* Is the task currently in a non-preemptive section? */
+                uint32_t flag:31;
+                /* Should the task call into the scheduler? */
+                uint32_t preempt:1;
+        } np;
+};
+/* The definition of the data that is shared between the kernel and real-time
+ * tasks via a shared page (see litmus/ctrldev.c).
+ *
+ * WARNING: User space can write to this, so don't trust
+ * the correctness of the fields!
+ *
+ * This servees two purposes: to enable efficient signaling
+ * of non-preemptive sections (user->kernel) and
+ * delayed preemptions (kernel->user), and to export
+ * some real-time relevant statistics such as preemption and
+ * migration data to user space. We can't use a device to export
+ * statistics because we want to avoid system call overhead when
+ * determining preemption/migration overheads).
+ */
+struct control_page {
+        /* This flag is used by userspace to communicate non-preempive
+         * sections. */
+        volatile __attribute__ ((aligned (8))) union np_flag sched;
+        /* Incremented by the kernel each time an IRQ is handled. */
+        volatile __attribute__ ((aligned (8))) uint64_t irq_count;
+        /* Locking overhead tracing: userspace records here the time stamp
+         * and IRQ counter prior to starting the system call. */
+        uint64_t ts_syscall_start;  /* Feather-Trace cycles */
+        uint64_t irq_syscall_start; /* Snapshot of irq_count when the syscall
+                                     * started. */
+        lt_t deadline; /* Deadline for the currently executing job */
+        lt_t release;  /* Release time of current job */
+        uint64_t job_index; /* Job sequence number of current job */
+        /* to be extended */
+};
+/* Expected offsets within the control page. */
+#define LITMUS_CP_OFFSET_SCHED          0
+#define LITMUS_CP_OFFSET_IRQ_COUNT      8
+#define LITMUS_CP_OFFSET_TS_SC_START    16
+#define LITMUS_CP_OFFSET_IRQ_SC_START   24
+#define LITMUS_CP_OFFSET_DEADLINE       32
+#define LITMUS_CP_OFFSET_RELEASE        40
+#define LITMUS_CP_OFFSET_JOB_INDEX      48
+/* System call emulation via ioctl() */
+typedef enum {
+        LRT_null_call = 2006,
+        LRT_set_rt_task_param,
+        LRT_get_rt_task_param,
+        LRT_reservation_create,
+        LRT_complete_job,
+        LRT_od_open,
+        LRT_od_close,
+        LRT_litmus_lock,
+        LRT_litmus_unlock,
+        LRT_wait_for_job_release,
+        LRT_wait_for_ts_release,
+        LRT_release_ts,
+        LRT_get_current_budget,
+} litmus_syscall_id_t;
+union litmus_syscall_args {
+        struct {
+                pid_t pid;
+                struct rt_task __user *param;
+        } get_set_task_param;
+        struct {
+                uint32_t type;
+                void __user *config;
+        } reservation_create;
+        struct {
+                uint32_t fd;
+                uint32_t obj_type;
+                uint32_t obj_id;
+                void __user *config;
+        } od_open;
+        struct {
+                lt_t __user *expended;
+                lt_t __user *remaining;
+        } get_current_budget;
+};
+#endif
diff --git a/include/litmus/debug_trace.h b/include/litmus/debug_trace.h
index 1266ac6a760c..a760631d4fca 100644
--- a/include/litmus/debug_trace.h
+++ b/include/litmus/debug_trace.h
@@ -37,4 +37,9 @@ extern atomic_t __log_seq_no;
 #define TRACE_CUR(fmt, args...) \
        TRACE_TASK(current, fmt, ## args)
+#define TRACE_WARN_ON(cond) \
+        if (unlikely(cond)) \
+                TRACE("WARNING: '%s' [%s@%s:%d]\n", \
+                        #cond, __FUNCTION__, __FILE__, __LINE__)
 #endif
diff --git a/include/litmus/edf_common.h b/include/litmus/edf_common.h
new file mode 100644
index 000000000000..bbaf22ea7f12
--- /dev/null
+++ b/include/litmus/edf_common.h
@@ -0,0 +1,25 @@
+/*
+ * EDF common data structures and utility functions shared by all EDF
+ * based scheduler plugins
+ */
+/* CLEANUP: Add comments and make it less messy.
+ *
+ */
+#ifndef __UNC_EDF_COMMON_H__
+#define __UNC_EDF_COMMON_H__
+#include <litmus/rt_domain.h>
+void edf_domain_init(rt_domain_t* rt, check_resched_needed_t resched,
+                     release_jobs_t release);
+int edf_higher_prio(struct task_struct* first,
+                    struct task_struct* second);
+int edf_ready_order(struct bheap_node* a, struct bheap_node* b);
+int edf_preemption_needed(rt_domain_t* rt, struct task_struct *t);
+#endif
diff --git a/include/litmus/fdso.h b/include/litmus/fdso.h
new file mode 100644
index 000000000000..fd9b30dbfb34
--- /dev/null
+++ b/include/litmus/fdso.h
@@ -0,0 +1,78 @@
+/* fdso.h - file descriptor attached shared objects
+ *
+ * (c) 2007 B. Brandenburg, LITMUS^RT project
+ */
+#ifndef _LINUX_FDSO_H_
+#define _LINUX_FDSO_H_
+#include <linux/list.h>
+#include <asm/atomic.h>
+#include <linux/fs.h>
+#include <linux/slab.h>
+#define MAX_OBJECT_DESCRIPTORS 85
+typedef enum  {
+        MIN_OBJ_TYPE    = 0,
+        FMLP_SEM        = 0,
+        SRP_SEM         = 1,
+        MPCP_SEM        = 2,
+        MPCP_VS_SEM     = 3,
+        DPCP_SEM        = 4,
+        PCP_SEM         = 5,
+        DFLP_SEM        = 6,
+        MAX_OBJ_TYPE    = 6
+} obj_type_t;
+struct inode_obj_id {
+        struct list_head        list;
+        atomic_t                count;
+        struct inode*           inode;
+        obj_type_t              type;
+        void*                   obj;
+        unsigned int            id;
+};
+struct fdso_ops;
+struct od_table_entry {
+        unsigned int            used;
+        struct inode_obj_id*    obj;
+        const struct fdso_ops*  class;
+};
+struct fdso_ops {
+        int   (*create)(void** obj_ref, obj_type_t type, void* __user);
+        void  (*destroy)(obj_type_t type, void*);
+        int   (*open)   (struct od_table_entry*, void* __user);
+        int   (*close)  (struct od_table_entry*);
+};
+/* translate a userspace supplied od into the raw table entry
+ * returns NULL if od is invalid
+ */
+struct od_table_entry* get_entry_for_od(int od);
+/* translate a userspace supplied od into the associated object
+ * returns NULL if od is invalid
+ */
+static inline void* od_lookup(int od, obj_type_t type)
+{
+        struct od_table_entry* e = get_entry_for_od(od);
+        return e && e->obj->type == type ? e->obj->obj : NULL;
+}
+#define lookup_fmlp_sem(od)((struct pi_semaphore*)  od_lookup(od, FMLP_SEM))
+#define lookup_srp_sem(od) ((struct srp_semaphore*) od_lookup(od, SRP_SEM))
+#define lookup_ics(od)     ((struct ics*)           od_lookup(od, ICS_ID))
+#endif
diff --git a/include/litmus/fp_common.h b/include/litmus/fp_common.h
new file mode 100644
index 000000000000..71c0d0142fc4
--- /dev/null
+++ b/include/litmus/fp_common.h
@@ -0,0 +1,183 @@
+/* Fixed-priority scheduler support.
+ */
+#ifndef __FP_COMMON_H__
+#define __FP_COMMON_H__
+#include <litmus/rt_domain.h>
+#include <asm/bitops.h>
+void fp_domain_init(rt_domain_t* rt, check_resched_needed_t resched,
+                    release_jobs_t release);
+int fp_higher_prio(struct task_struct* first,
+                   struct task_struct* second);
+int fp_ready_order(struct bheap_node* a, struct bheap_node* b);
+#define FP_PRIO_BIT_WORDS (LITMUS_MAX_PRIORITY / BITS_PER_LONG)
+#if (LITMUS_MAX_PRIORITY % BITS_PER_LONG)
+#error LITMUS_MAX_PRIORITY must be a multiple of BITS_PER_LONG
+#endif
+/* bitmask-inexed priority queue */
+struct fp_prio_queue {
+        unsigned long   bitmask[FP_PRIO_BIT_WORDS];
+        struct bheap    queue[LITMUS_MAX_PRIORITY];
+};
+void fp_prio_queue_init(struct fp_prio_queue* q);
+static inline void fpq_set(struct fp_prio_queue* q, unsigned int index)
+{
+        unsigned long *word = q->bitmask + (index / BITS_PER_LONG);
+        __set_bit(index % BITS_PER_LONG, word);
+}
+static inline void fpq_clear(struct fp_prio_queue* q, unsigned int index)
+{
+        unsigned long *word = q->bitmask + (index / BITS_PER_LONG);
+        __clear_bit(index % BITS_PER_LONG, word);
+}
+static inline unsigned int fpq_find(struct fp_prio_queue* q)
+{
+        int i;
+        /* loop optimizer should unroll this */
+        for (i = 0; i < FP_PRIO_BIT_WORDS; i++)
+                if (q->bitmask[i])
+                        return __ffs(q->bitmask[i]) + i * BITS_PER_LONG;
+        return LITMUS_MAX_PRIORITY; /* nothing found */
+}
+static inline void fp_prio_add(struct fp_prio_queue* q, struct task_struct* t, unsigned int index)
+{
+        BUG_ON(index >= LITMUS_MAX_PRIORITY);
+        BUG_ON(bheap_node_in_heap(tsk_rt(t)->heap_node));
+        fpq_set(q, index);
+        bheap_insert(fp_ready_order, &q->queue[index], tsk_rt(t)->heap_node);
+}
+static inline void fp_prio_remove(struct fp_prio_queue* q, struct task_struct* t, unsigned int index)
+{
+        BUG_ON(!is_queued(t));
+        bheap_delete(fp_ready_order, &q->queue[index], tsk_rt(t)->heap_node);
+        if (likely(bheap_empty(&q->queue[index])))
+                fpq_clear(q, index);
+}
+static inline struct task_struct* fp_prio_peek(struct fp_prio_queue* q)
+{
+        unsigned int idx = fpq_find(q);
+        struct bheap_node* hn;
+        if (idx < LITMUS_MAX_PRIORITY) {
+                hn = bheap_peek(fp_ready_order, &q->queue[idx]);
+                return bheap2task(hn);
+        } else
+                return NULL;
+}
+static inline struct task_struct* fp_prio_take(struct fp_prio_queue* q)
+{
+        unsigned int idx = fpq_find(q);
+        struct bheap_node* hn;
+        if (idx < LITMUS_MAX_PRIORITY) {
+                hn = bheap_take(fp_ready_order, &q->queue[idx]);
+                if (likely(bheap_empty(&q->queue[idx])))
+                        fpq_clear(q, idx);
+                return bheap2task(hn);
+        } else
+                return NULL;
+}
+int fp_preemption_needed(struct fp_prio_queue*  q, struct task_struct *t);
+/* ******* list-based version ******** */
+/* bitmask-inexed priority queue */
+struct fp_ready_list {
+        unsigned long     bitmask[FP_PRIO_BIT_WORDS];
+        struct list_head  queue[LITMUS_MAX_PRIORITY];
+};
+void fp_ready_list_init(struct fp_ready_list* q);
+static inline void fp_rl_set(struct fp_ready_list* q, unsigned int index)
+{
+        unsigned long *word = q->bitmask + (index / BITS_PER_LONG);
+        __set_bit(index % BITS_PER_LONG, word);
+}
+static inline void fp_rl_clear(struct fp_ready_list* q, unsigned int index)
+{
+        unsigned long *word = q->bitmask + (index / BITS_PER_LONG);
+        __clear_bit(index % BITS_PER_LONG, word);
+}
+static inline unsigned int fp_rl_find(struct fp_ready_list* q)
+{
+        int i;
+        /* loop optimizer should unroll this */
+        for (i = 0; i < FP_PRIO_BIT_WORDS; i++)
+                if (q->bitmask[i])
+                        return __ffs(q->bitmask[i]) + i * BITS_PER_LONG;
+        return LITMUS_MAX_PRIORITY; /* nothing found */
+}
+static inline void fp_ready_list_add(
+        struct fp_ready_list* q, struct list_head* lh, unsigned int index)
+{
+        BUG_ON(index >= LITMUS_MAX_PRIORITY);
+        BUG_ON(in_list(lh));
+        fp_rl_set(q, index);
+        list_add_tail(lh, &q->queue[index]);
+}
+static inline void fp_ready_list_remove(
+        struct fp_ready_list* q, struct list_head* lh, unsigned int index)
+{
+        BUG_ON(!in_list(lh));
+        list_del(lh);
+        if (likely(list_empty(q->queue + index)))
+                fp_rl_clear(q, index);
+}
+static inline struct list_head* fp_ready_list_peek(struct fp_ready_list* q)
+{
+        unsigned int idx = fp_rl_find(q);
+        if (idx < LITMUS_MAX_PRIORITY) {
+                return q->queue[idx].next;
+        } else
+                return NULL;
+}
+static inline struct list_head* fp_ready_list_take(struct fp_ready_list* q)
+{
+        unsigned int idx = fp_rl_find(q);
+        struct list_head* lh;
+        if (idx < LITMUS_MAX_PRIORITY) {
+                lh = q->queue[idx].next;
+                fp_ready_list_remove(q, lh, idx);
+                return lh;
+        } else
+                return NULL;
+}
+#endif
diff --git a/include/litmus/fpmath.h b/include/litmus/fpmath.h
new file mode 100644
index 000000000000..642de98542c8
--- /dev/null
+++ b/include/litmus/fpmath.h
@@ -0,0 +1,147 @@
+#ifndef __FP_MATH_H__
+#define __FP_MATH_H__
+#include <linux/math64.h>
+#ifndef __KERNEL__
+#include <stdint.h>
+#define abs(x) (((x) < 0) ? -(x) : x)
+#endif
+// Use 64-bit because we want to track things at the nanosecond scale.
+// This can lead to very large numbers.
+typedef int64_t fpbuf_t;
+typedef struct
+{
+        fpbuf_t val;
+} fp_t;
+#define FP_SHIFT 10
+#define ROUND_BIT (FP_SHIFT - 1)
+#define _fp(x) ((fp_t) {x})
+#ifdef __KERNEL__
+static const fp_t LITMUS_FP_ZERO = {.val = 0};
+static const fp_t LITMUS_FP_ONE = {.val = (1 << FP_SHIFT)};
+#endif
+static inline fp_t FP(fpbuf_t x)
+{
+        return _fp(((fpbuf_t) x) << FP_SHIFT);
+}
+/* divide two integers to obtain a fixed point value  */
+static inline fp_t _frac(fpbuf_t a, fpbuf_t b)
+{
+        return _fp(div64_s64(FP(a).val, (b)));
+}
+static inline fpbuf_t _point(fp_t x)
+{
+        return (x.val % (1 << FP_SHIFT));
+}
+#define fp2str(x) x.val
+/*(x.val >> FP_SHIFT), (x.val % (1 << FP_SHIFT)) */
+#define _FP_  "%ld/1024"
+static inline fpbuf_t _floor(fp_t x)
+{
+        return x.val >> FP_SHIFT;
+}
+/* FIXME: negative rounding */
+static inline fpbuf_t _round(fp_t x)
+{
+        return _floor(x) + ((x.val >> ROUND_BIT) & 1);
+}
+/* multiply two fixed point values */
+static inline fp_t _mul(fp_t a, fp_t b)
+{
+        return _fp((a.val * b.val) >> FP_SHIFT);
+}
+static inline fp_t _div(fp_t a, fp_t b)
+{
+#if !defined(__KERNEL__) && !defined(unlikely)
+#define unlikely(x) (x)
+#define DO_UNDEF_UNLIKELY
+#endif
+        /* try not to overflow */
+        if (unlikely(  a.val > (2l << ((sizeof(fpbuf_t)*8) - FP_SHIFT)) ))
+                return _fp((a.val / b.val) << FP_SHIFT);
+        else
+                return _fp((a.val << FP_SHIFT) / b.val);
+#ifdef DO_UNDEF_UNLIKELY
+#undef unlikely
+#undef DO_UNDEF_UNLIKELY
+#endif
+}
+static inline fp_t _add(fp_t a, fp_t b)
+{
+        return _fp(a.val + b.val);
+}
+static inline fp_t _sub(fp_t a, fp_t b)
+{
+        return _fp(a.val - b.val);
+}
+static inline fp_t _neg(fp_t x)
+{
+        return _fp(-x.val);
+}
+static inline fp_t _abs(fp_t x)
+{
+        return _fp(abs(x.val));
+}
+/* works the same as casting float/double to integer */
+static inline fpbuf_t _fp_to_integer(fp_t x)
+{
+        return _floor(_abs(x)) * ((x.val > 0) ? 1 : -1);
+}
+static inline fp_t _integer_to_fp(fpbuf_t x)
+{
+        return _frac(x,1);
+}
+static inline int _leq(fp_t a, fp_t b)
+{
+        return a.val <= b.val;
+}
+static inline int _geq(fp_t a, fp_t b)
+{
+        return a.val >= b.val;
+}
+static inline int _lt(fp_t a, fp_t b)
+{
+        return a.val < b.val;
+}
+static inline int _gt(fp_t a, fp_t b)
+{
+        return a.val > b.val;
+}
+static inline int _eq(fp_t a, fp_t b)
+{
+        return a.val == b.val;
+}
+static inline fp_t _max(fp_t a, fp_t b)
+{
+        if (a.val < b.val)
+                return b;
+        else
+                return a;
+}
+#endif
diff --git a/include/litmus/jobs.h b/include/litmus/jobs.h
new file mode 100644
index 000000000000..7033393148df
--- /dev/null
+++ b/include/litmus/jobs.h
@@ -0,0 +1,13 @@
+#ifndef __LITMUS_JOBS_H__
+#define __LITMUS_JOBS_H__
+void prepare_for_next_period(struct task_struct *t);
+void release_at(struct task_struct *t, lt_t start);
+void inferred_sporadic_job_release_at(struct task_struct *t, lt_t when);
+long default_wait_for_release_at(lt_t release_time);
+long complete_job(void);
+long complete_job_oneshot(void);
+#endif
diff --git a/include/litmus/litmus.h b/include/litmus/litmus.h
index c87863c9b231..f550367ddd4b 100644
--- a/include/litmus/litmus.h
+++ b/include/litmus/litmus.h
@@ -6,7 +6,50 @@
 #ifndef _LINUX_LITMUS_H_
 #define _LINUX_LITMUS_H_
+#include <litmus/ctrlpage.h>
+#ifdef CONFIG_RELEASE_MASTER
+extern atomic_t release_master_cpu;
+#endif
+/* in_list - is a given list_head queued on some list?
+ */
+static inline int in_list(struct list_head* list)
+{
+        return !(  /* case 1: deleted */
+                   (list->next == LIST_POISON1 &&
+                    list->prev == LIST_POISON2)
+                 ||
+                   /* case 2: initialized */
+                   (list->next == list &&
+                    list->prev == list)
+                );
+}
+struct task_struct* __waitqueue_remove_first(wait_queue_head_t *wq);
+#define NO_CPU                  0xffffffff
+void litmus_fork(struct task_struct *tsk);
+void litmus_exec(void);
+/* clean up real-time state of a task */
+void litmus_clear_state(struct task_struct *dead_tsk);
+void exit_litmus(struct task_struct *dead_tsk);
+/* Prevent the plugin from being switched-out from underneath a code
+ * path. Might sleep, so may be called only from non-atomic context. */
+void litmus_plugin_switch_disable(void);
+void litmus_plugin_switch_enable(void);
+long litmus_admit_task(struct task_struct *tsk);
+void litmus_exit_task(struct task_struct *tsk);
+void litmus_dealloc(struct task_struct *tsk);
+void litmus_do_exit(struct task_struct *tsk);
+int litmus_be_migrate_to(int cpu);
 #define is_realtime(t)          ((t)->policy == SCHED_LITMUS)
+#define rt_transition_pending(t) \
+        ((t)->rt_param.transition_pending)
 #define tsk_rt(t)               (&(t)->rt_param)
@@ -28,6 +71,7 @@
 #define get_partition(t)        (tsk_rt(t)->task_params.cpu)
 #define get_priority(t)         (tsk_rt(t)->task_params.priority)
 #define get_class(t)        (tsk_rt(t)->task_params.cls)
+#define get_release_policy(t) (tsk_rt(t)->task_params.release_policy)
 /* job_param macros */
 #define get_exec_time(t)    (tsk_rt(t)->job_params.exec_time)
@@ -35,6 +79,15 @@
 #define get_release(t)          (tsk_rt(t)->job_params.release)
 #define get_lateness(t)         (tsk_rt(t)->job_params.lateness)
+/* release policy macros */
+#define is_periodic(t)          (get_release_policy(t) == TASK_PERIODIC)
+#define is_sporadic(t)          (get_release_policy(t) == TASK_SPORADIC)
+#ifdef CONFIG_ALLOW_EARLY_RELEASE
+#define is_early_releasing(t)   (get_release_policy(t) == TASK_EARLY)
+#else
+#define is_early_releasing(t)   (0)
+#endif
 #define is_hrt(t)               \
        (tsk_rt(t)->task_params.cls == RT_CLASS_HARD)
 #define is_srt(t)               \
@@ -48,6 +101,67 @@ static inline lt_t litmus_clock(void)
        return ktime_to_ns(ktime_get());
 }
+/* A macro to convert from nanoseconds to ktime_t. */
+#define ns_to_ktime(t)          ktime_add_ns(ktime_set(0, 0), t)
+#define is_released(t, now)     \
+        (lt_before_eq(get_release(t), now))
+#define is_tardy(t, now)    \
+        (lt_before_eq(tsk_rt(t)->job_params.deadline, now))
+/* real-time comparison macros */
+#define earlier_deadline(a, b) (lt_before(\
+        (a)->rt_param.job_params.deadline,\
+        (b)->rt_param.job_params.deadline))
+#define earlier_release(a, b)  (lt_before(\
+        (a)->rt_param.job_params.release,\
+        (b)->rt_param.job_params.release))
+void preempt_if_preemptable(struct task_struct* t, int on_cpu);
+#define bheap2task(hn) ((struct task_struct*) hn->value)
+static inline int is_present(struct task_struct* t)
+{
+        return t && tsk_rt(t)->present;
+}
+static inline int is_completed(struct task_struct* t)
+{
+        return t && tsk_rt(t)->completed;
+}
+/* Used to convert ns-specified execution costs and periods into
+ * integral quanta equivalents.
+ */
+#define LITMUS_QUANTUM_LENGTH_NS (CONFIG_LITMUS_QUANTUM_LENGTH_US * 1000ULL)
+/* make the unit explicit */
+typedef unsigned long quanta_t;
+enum round {
+        FLOOR,
+        CEIL
+};
+static inline quanta_t time2quanta(lt_t time, enum round round)
+{
+        s64  quantum_length = LITMUS_QUANTUM_LENGTH_NS;
+        if (do_div(time, quantum_length) && round == CEIL)
+                time++;
+        return (quanta_t) time;
+}
+static inline lt_t quanta2time(quanta_t quanta)
+{
+        return quanta * LITMUS_QUANTUM_LENGTH_NS;
+}
+/* By how much is cpu staggered behind CPU 0? */
+u64 cpu_stagger_offset(int cpu);
 static inline struct control_page* get_control_page(struct task_struct *t)
 {
        return tsk_rt(t)->ctrl_page;
@@ -58,4 +172,53 @@ static inline int has_control_page(struct task_struct* t)
        return tsk_rt(t)->ctrl_page != NULL;
 }
+#ifdef CONFIG_SCHED_OVERHEAD_TRACE
+#define TS_SYSCALL_IN_START                                             \
+        if (has_control_page(current)) {                                \
+                __TS_SYSCALL_IN_START(&get_control_page(current)->ts_syscall_start); \
+        }
+#define TS_SYSCALL_IN_END                                               \
+        if (has_control_page(current)) {                                \
+                unsigned long flags;                                    \
+                uint64_t irqs;                                          \
+                local_irq_save(flags);                                  \
+                irqs = get_control_page(current)->irq_count -           \
+                        get_control_page(current)->irq_syscall_start;   \
+                __TS_SYSCALL_IN_END(&irqs);                             \
+                local_irq_restore(flags);                               \
+        }
+#else
+#define TS_SYSCALL_IN_START
+#define TS_SYSCALL_IN_END
+#endif
+#ifdef CONFIG_SMP
+/*
+ * struct hrtimer_start_on_info - timer info on remote cpu
+ * @timer:      timer to be triggered on remote cpu
+ * @time:       time event
+ * @mode:       timer mode
+ * @csd:        smp_call_function parameter to call hrtimer_pull on remote cpu
+ */
+struct hrtimer_start_on_info {
+        struct hrtimer          *timer;
+        ktime_t                 time;
+        enum hrtimer_mode       mode;
+        struct call_single_data csd;
+};
+void hrtimer_pull(void *csd_info);
+extern void hrtimer_start_on(int cpu, struct hrtimer_start_on_info *info,
+                        struct hrtimer *timer, ktime_t time,
+                        const enum hrtimer_mode mode);
+#endif
 #endif
diff --git a/include/litmus/litmus_proc.h b/include/litmus/litmus_proc.h
new file mode 100644
index 000000000000..a5db24c03ec0
--- /dev/null
+++ b/include/litmus/litmus_proc.h
@@ -0,0 +1,63 @@
+#include <litmus/sched_plugin.h>
+#include <linux/proc_fs.h>
+int __init init_litmus_proc(void);
+void exit_litmus_proc(void);
+struct cd_mapping
+{
+        int id;
+        cpumask_var_t mask;
+        struct proc_dir_entry *proc_file;
+};
+struct domain_proc_info
+{
+        int num_cpus;
+        int num_domains;
+        struct cd_mapping *cpu_to_domains;
+        struct cd_mapping *domain_to_cpus;
+};
+/*
+ * On success, returns 0 and sets the pointer to the location of the new
+ * proc dir entry, otherwise returns an error code and sets pde to NULL.
+ */
+long make_plugin_proc_dir(struct sched_plugin* plugin,
+                struct proc_dir_entry** pde);
+/*
+ * Plugins should deallocate all child proc directory entries before
+ * calling this, to avoid memory leaks.
+ */
+void remove_plugin_proc_dir(struct sched_plugin* plugin);
+/*
+ * Setup the CPU <-> sched domain mappings in proc
+ */
+long activate_domain_proc(struct domain_proc_info* map);
+/*
+ * Remove the CPU <-> sched domain mappings from proc
+ */
+long deactivate_domain_proc(void);
+/*
+ * Alloc memory for the mapping
+ * Note: Does not set up proc files. Use make_sched_domain_maps for that.
+ */
+long init_domain_proc_info(struct domain_proc_info* map,
+        int num_cpus, int num_domains);
+/*
+ * Free memory of the mapping
+ * Note: Does not clean up proc files. Use deactivate_domain_proc for that.
+ */
+void destroy_domain_proc_info(struct domain_proc_info* map);
+/* Copy at most size-1 bytes from ubuf into kbuf, null-terminate buf, and
+ * remove a '\n' if present. Returns the number of bytes that were read or
+ * -EFAULT. */
+int copy_and_chomp(char *kbuf, unsigned long ksize,
+                   __user const char* ubuf, unsigned long ulength);
diff --git a/include/litmus/locking.h b/include/litmus/locking.h
new file mode 100644
index 000000000000..4d7b870cb443
--- /dev/null
+++ b/include/litmus/locking.h
@@ -0,0 +1,28 @@
+#ifndef LITMUS_LOCKING_H
+#define LITMUS_LOCKING_H
+struct litmus_lock_ops;
+/* Generic base struct for LITMUS^RT userspace semaphores.
+ * This structure should be embedded in protocol-specific semaphores.
+ */
+struct litmus_lock {
+        struct litmus_lock_ops *ops;
+        int type;
+};
+struct litmus_lock_ops {
+        /* Current task tries to obtain / drop a reference to a lock.
+         * Optional methods, allowed by default. */
+        int (*open)(struct litmus_lock*, void* __user);
+        int (*close)(struct litmus_lock*);
+        /* Current tries to lock/unlock this lock (mandatory methods). */
+        int (*lock)(struct litmus_lock*);
+        int (*unlock)(struct litmus_lock*);
+        /* The lock is no longer being referenced (mandatory method). */
+        void (*deallocate)(struct litmus_lock*);
+};
+#endif
diff --git a/include/litmus/np.h b/include/litmus/np.h
new file mode 100644
index 000000000000..dbe2b695f74a
--- /dev/null
+++ b/include/litmus/np.h
@@ -0,0 +1,121 @@
+#ifndef _LITMUS_NP_H_
+#define _LITMUS_NP_H_
+/* Definitions related to non-preemptive sections signaled via the control
+ * page
+ */
+#ifdef CONFIG_NP_SECTION
+static inline int is_kernel_np(struct task_struct *t)
+{
+        return tsk_rt(t)->kernel_np;
+}
+static inline int is_user_np(struct task_struct *t)
+{
+        return tsk_rt(t)->ctrl_page ? tsk_rt(t)->ctrl_page->sched.np.flag : 0;
+}
+static inline void request_exit_np(struct task_struct *t)
+{
+        if (is_user_np(t)) {
+                /* Set the flag that tells user space to call
+                 * into the kernel at the end of a critical section. */
+                if (likely(tsk_rt(t)->ctrl_page)) {
+                        TRACE_TASK(t, "setting delayed_preemption flag\n");
+                        tsk_rt(t)->ctrl_page->sched.np.preempt = 1;
+                }
+        }
+}
+static inline void make_np(struct task_struct *t)
+{
+        tsk_rt(t)->kernel_np++;
+}
+/* Caller should check if preemption is necessary when
+ * the function return 0.
+ */
+static inline int take_np(struct task_struct *t)
+{
+        return --tsk_rt(t)->kernel_np;
+}
+/* returns 0 if remote CPU needs an IPI to preempt, 1 if no IPI is required */
+static inline int request_exit_np_atomic(struct task_struct *t)
+{
+        union np_flag old, new;
+        if (tsk_rt(t)->ctrl_page) {
+                old.raw = tsk_rt(t)->ctrl_page->sched.raw;
+                if (old.np.flag == 0) {
+                        /* no longer non-preemptive */
+                        return 0;
+                } else if (old.np.preempt) {
+                        /* already set, nothing for us to do */
+                        return 1;
+                } else {
+                        /* non preemptive and flag not set */
+                        new.raw = old.raw;
+                        new.np.preempt = 1;
+                        /* if we get old back, then we atomically set the flag */
+                        return cmpxchg(&tsk_rt(t)->ctrl_page->sched.raw, old.raw, new.raw) == old.raw;
+                        /* If we raced with a concurrent change, then so be
+                         * it. Deliver it by IPI.  We don't want an unbounded
+                         * retry loop here since tasks might exploit that to
+                         * keep the kernel busy indefinitely. */
+                }
+        } else
+                return 0;
+}
+#else
+static inline int is_kernel_np(struct task_struct* t)
+{
+        return 0;
+}
+static inline int is_user_np(struct task_struct* t)
+{
+        return 0;
+}
+static inline void request_exit_np(struct task_struct *t)
+{
+        /* request_exit_np() shouldn't be called if !CONFIG_NP_SECTION */
+        BUG();
+}
+static inline int request_exit_np_atomic(struct task_struct *t)
+{
+        return 0;
+}
+#endif
+static inline void clear_exit_np(struct task_struct *t)
+{
+        if (likely(tsk_rt(t)->ctrl_page))
+                tsk_rt(t)->ctrl_page->sched.np.preempt = 0;
+}
+static inline int is_np(struct task_struct *t)
+{
+#ifdef CONFIG_SCHED_DEBUG_TRACE
+        int kernel, user;
+        kernel = is_kernel_np(t);
+        user   = is_user_np(t);
+        if (kernel || user)
+                TRACE_TASK(t, " is non-preemptive: kernel=%d user=%d\n",
+                           kernel, user);
+        return kernel || user;
+#else
+        return unlikely(is_kernel_np(t) || is_user_np(t));
+#endif
+}
+#endif
diff --git a/include/litmus/preempt.h b/include/litmus/preempt.h
new file mode 100644
index 000000000000..8f8bb635cb21
--- /dev/null
+++ b/include/litmus/preempt.h
@@ -0,0 +1,188 @@
+#ifndef LITMUS_PREEMPT_H
+#define LITMUS_PREEMPT_H
+#include <linux/types.h>
+#include <linux/cache.h>
+#include <linux/percpu.h>
+#include <asm/atomic.h>
+#include <litmus/debug_trace.h>
+DECLARE_PER_CPU(bool, litmus_preemption_in_progress);
+/* is_current_running() is legacy macro (and a hack) that is used to make
+ * the plugin logic, which still stems from the 2.6.20 era, work with current
+ * kernels.
+ *
+ * It used to honor the flag in the preempt_count variable that was
+ * set when scheduling is in progress. This doesn't exist anymore in recent
+ * Linux versions. Instead, Linux has moved to passing a 'preempt' flag to
+ * __schedule(). In particular, Linux ignores prev->state != TASK_RUNNING and
+ * does *not* process self-suspensions if an interrupt (i.e., a preemption)
+ * races with a task that is about to call schedule() anyway.
+ *
+ * The value of the 'preempt' flag in __schedule() is crucial
+ * information for some of the LITMUS^RT plugins, which must re-add
+ * soon-to-block tasks to the ready queue if the rest of the system doesn't
+ * process the preemption yet. Unfortunately, the flag is not passed to
+ * pick_next_task(). Hence, as a hack, we communicate it out of band via the
+ * global, per-core variable litmus_preemption_in_progress, which is set by
+ * the scheduler in __schedule() and read by the plugins via the
+ * is_current_running() macro.
+ */
+#define is_current_running() \
+        ((current)->state == TASK_RUNNING || \
+        this_cpu_read(litmus_preemption_in_progress))
+DECLARE_PER_CPU_SHARED_ALIGNED(atomic_t, resched_state);
+#ifdef CONFIG_PREEMPT_STATE_TRACE
+const char* sched_state_name(int s);
+#define TRACE_STATE(fmt, args...) TRACE("SCHED_STATE " fmt, args)
+#else
+#define TRACE_STATE(fmt, args...) /* ignore */
+#endif
+#define VERIFY_SCHED_STATE(x)                                           \
+        do { int __s = get_sched_state();                               \
+                if ((__s & (x)) == 0)                                   \
+                        TRACE_STATE("INVALID s=0x%x (%s) not "          \
+                                    "in 0x%x (%s) [%s]\n",              \
+                                    __s, sched_state_name(__s),         \
+                                    (x), #x, __FUNCTION__);             \
+        } while (0);
+#define TRACE_SCHED_STATE_CHANGE(x, y, cpu)                             \
+        TRACE_STATE("[P%d] 0x%x (%s) -> 0x%x (%s)\n",                   \
+                    cpu,  (x), sched_state_name(x),                     \
+                    (y), sched_state_name(y))
+typedef enum scheduling_state {
+        TASK_SCHEDULED    = (1 << 0),  /* The currently scheduled task is the one that
+                                        * should be scheduled, and the processor does not
+                                        * plan to invoke schedule(). */
+        SHOULD_SCHEDULE   = (1 << 1),  /* A remote processor has determined that the
+                                        * processor should reschedule, but this has not
+                                        * been communicated yet (IPI still pending). */
+        WILL_SCHEDULE     = (1 << 2),  /* The processor has noticed that it has to
+                                        * reschedule and will do so shortly. */
+        TASK_PICKED       = (1 << 3),  /* The processor is currently executing schedule(),
+                                        * has selected a new task to schedule, but has not
+                                        * yet performed the actual context switch. */
+        PICKED_WRONG_TASK = (1 << 4),  /* The processor has not yet performed the context
+                                        * switch, but a remote processor has already
+                                        * determined that a higher-priority task became
+                                        * eligible after the task was picked. */
+} sched_state_t;
+static inline sched_state_t get_sched_state_on(int cpu)
+{
+        return atomic_read(&per_cpu(resched_state, cpu));
+}
+static inline sched_state_t get_sched_state(void)
+{
+        return atomic_read(this_cpu_ptr(&resched_state));
+}
+static inline int is_in_sched_state(int possible_states)
+{
+        return get_sched_state() & possible_states;
+}
+static inline int cpu_is_in_sched_state(int cpu, int possible_states)
+{
+        return get_sched_state_on(cpu) & possible_states;
+}
+static inline void set_sched_state(sched_state_t s)
+{
+        TRACE_SCHED_STATE_CHANGE(get_sched_state(), s, smp_processor_id());
+        atomic_set(this_cpu_ptr(&resched_state), s);
+}
+static inline int sched_state_transition(sched_state_t from, sched_state_t to)
+{
+        sched_state_t old_state;
+        old_state = atomic_cmpxchg(this_cpu_ptr(&resched_state), from, to);
+        if (old_state == from) {
+                TRACE_SCHED_STATE_CHANGE(from, to, smp_processor_id());
+                return 1;
+        } else
+                return 0;
+}
+static inline int sched_state_transition_on(int cpu,
+                                            sched_state_t from,
+                                            sched_state_t to)
+{
+        sched_state_t old_state;
+        old_state = atomic_cmpxchg(&per_cpu(resched_state, cpu), from, to);
+        if (old_state == from) {
+                TRACE_SCHED_STATE_CHANGE(from, to, cpu);
+                return 1;
+        } else
+                return 0;
+}
+/* Plugins must call this function after they have decided which job to
+ * schedule next.  IMPORTANT: this function must be called while still holding
+ * the lock that is used to serialize scheduling decisions.
+ *
+ * (Ideally, we would like to use runqueue locks for this purpose, but that
+ * would lead to deadlocks with the migration code.)
+ */
+static inline void sched_state_task_picked(void)
+{
+        VERIFY_SCHED_STATE(WILL_SCHEDULE);
+        /* WILL_SCHEDULE has only a local tansition => simple store is ok */
+        set_sched_state(TASK_PICKED);
+}
+static inline void sched_state_entered_schedule(void)
+{
+        /* Update state for the case that we entered schedule() not due to
+         * set_tsk_need_resched() */
+        set_sched_state(WILL_SCHEDULE);
+}
+/* Called by schedule() to check if the scheduling decision is still valid
+ * after a context switch. Returns 1 if the CPU needs to reschdule. */
+static inline int sched_state_validate_switch(void)
+{
+        int decision_ok = 0;
+        VERIFY_SCHED_STATE(PICKED_WRONG_TASK | TASK_PICKED | WILL_SCHEDULE);
+        if (is_in_sched_state(TASK_PICKED)) {
+                /* Might be good; let's try to transition out of this
+                 * state. This must be done atomically since remote processors
+                 * may try to change the state, too. */
+                decision_ok = sched_state_transition(TASK_PICKED, TASK_SCHEDULED);
+        }
+        if (!decision_ok)
+                TRACE_STATE("validation failed (%s)\n",
+                        sched_state_name(get_sched_state()));
+        return !decision_ok;
+}
+/* State transition events. See litmus/preempt.c for details. */
+void sched_state_will_schedule(struct task_struct* tsk);
+void sched_state_ipi(void);
+/* Cause a CPU (remote or local) to reschedule. */
+void litmus_reschedule(int cpu);
+void litmus_reschedule_local(void);
+#ifdef CONFIG_DEBUG_KERNEL
+void sched_state_plugin_check(void);
+#else
+#define sched_state_plugin_check() /* no check */
+#endif
+#endif
diff --git a/include/litmus/reservations/alloc.h b/include/litmus/reservations/alloc.h
new file mode 100644
index 000000000000..b3471288c9f1
--- /dev/null
+++ b/include/litmus/reservations/alloc.h
@@ -0,0 +1,15 @@
+#ifndef LITMUS_RESERVATIONS_ALLOC_H
+#define LITMUS_RESERVATIONS_ALLOC_H
+#include <litmus/reservations/reservation.h>
+long alloc_polling_reservation(
+        int res_type,
+        struct reservation_config *config,
+        struct reservation **_res);
+long alloc_table_driven_reservation(
+        struct reservation_config *config,
+        struct reservation **_res);
+#endif
+\ No newline at end of file
diff --git a/include/litmus/reservations/budget-notifier.h b/include/litmus/reservations/budget-notifier.h
new file mode 100644
index 000000000000..d831fa9d5153
--- /dev/null
+++ b/include/litmus/reservations/budget-notifier.h
@@ -0,0 +1,50 @@
+#ifndef LITMUS_BUDGET_NOTIFIER_H
+#define LITMUS_BUDGET_NOTIFIER_H
+#include <linux/list.h>
+#include <linux/spinlock.h>
+struct budget_notifier;
+typedef void (*budget_callback_t)  (
+        struct budget_notifier *bn
+);
+struct budget_notifier {
+        struct list_head list;
+        budget_callback_t budget_exhausted;
+        budget_callback_t budget_replenished;
+};
+struct budget_notifier_list {
+        struct list_head list;
+        raw_spinlock_t lock;
+};
+void budget_notifier_list_init(struct budget_notifier_list* bnl);
+static inline void budget_notifier_add(
+        struct budget_notifier_list *bnl,
+        struct budget_notifier *bn)
+{
+        unsigned long flags;
+        raw_spin_lock_irqsave(&bnl->lock, flags);
+        list_add(&bn->list, &bnl->list);
+        raw_spin_unlock_irqrestore(&bnl->lock, flags);
+}
+static inline void budget_notifier_remove(
+        struct budget_notifier_list *bnl,
+        struct budget_notifier *bn)
+{
+        unsigned long flags;
+        raw_spin_lock_irqsave(&bnl->lock, flags);
+        list_del(&bn->list);
+        raw_spin_unlock_irqrestore(&bnl->lock, flags);
+}
+void budget_notifiers_fire(struct budget_notifier_list *bnl, bool replenished);
+#endif
diff --git a/include/litmus/reservations/polling.h b/include/litmus/reservations/polling.h
new file mode 100644
index 000000000000..230e12b1088a
--- /dev/null
+++ b/include/litmus/reservations/polling.h
@@ -0,0 +1,19 @@
+#ifndef LITMUS_POLLING_RESERVATIONS_H
+#define LITMUS_POLLING_RESERVATIONS_H
+#include <litmus/reservations/reservation.h>
+struct polling_reservation {
+        /* extend basic reservation */
+        struct reservation res;
+        lt_t max_budget;
+        lt_t period;
+        lt_t deadline;
+        lt_t offset;
+};
+void polling_reservation_init(struct polling_reservation *pres, int use_edf_prio,
+        int use_periodic_polling, lt_t budget, lt_t period, lt_t deadline, lt_t offset);
+#endif
diff --git a/include/litmus/reservations/reservation.h b/include/litmus/reservations/reservation.h
new file mode 100644
index 000000000000..1752dac4e698
--- /dev/null
+++ b/include/litmus/reservations/reservation.h
@@ -0,0 +1,224 @@
+#ifndef LITMUS_RESERVATION_H
+#define LITMUS_RESERVATION_H
+#include <linux/list.h>
+#include <linux/hrtimer.h>
+#include <litmus/debug_trace.h>
+#include <litmus/reservations/budget-notifier.h>
+struct reservation_client;
+struct reservation_environment;
+struct reservation;
+typedef enum {
+        /* reservation has no clients, is not consuming budget */
+        RESERVATION_INACTIVE = 0,
+        /* reservation has clients, consumes budget when scheduled */
+        RESERVATION_ACTIVE,
+        /* reservation has no clients, but may be consuming budget */
+        RESERVATION_ACTIVE_IDLE,
+        /* Reservation has no budget and waits for
+         * replenishment. May or may not have clients. */
+        RESERVATION_DEPLETED,
+} reservation_state_t;
+/* ************************************************************************** */
+/* Select which task to dispatch. If NULL is returned, it means there is nothing
+ * to schedule right now and background work can be scheduled. */
+typedef struct task_struct * (*dispatch_t)  (
+        struct reservation_client *client
+);
+/* Something that can be managed in a reservation and that can yield
+ * a process for dispatching. Contains a pointer to the reservation
+ * to which it "belongs". */
+struct reservation_client {
+        struct list_head list;
+        struct reservation* reservation;
+        dispatch_t dispatch;
+};
+/* ************************************************************************** */
+/* Called by reservations to request state change. */
+typedef void (*reservation_change_state_t)  (
+        struct reservation_environment* env,
+        struct reservation *res,
+        reservation_state_t new_state
+);
+/* Called by reservations to request replenishment while not DEPLETED.
+ * Useful for soft reservations that remain ACTIVE with lower priority. */
+typedef void (*request_replenishment_t)(
+        struct reservation_environment* env,
+        struct reservation *res
+);
+/* The framework within wich reservations operate. */
+struct reservation_environment {
+        lt_t time_zero;
+        lt_t current_time;
+        /* services invoked by reservations */
+        reservation_change_state_t change_state;
+        request_replenishment_t request_replenishment;
+};
+/* ************************************************************************** */
+/* A new client is added or an existing client resumes. */
+typedef void (*client_arrives_t)  (
+        struct reservation *reservation,
+        struct reservation_client *client
+);
+/* A client suspends or terminates. */
+typedef void (*client_departs_t)  (
+        struct reservation *reservation,
+        struct reservation_client *client,
+        int did_signal_job_completion
+);
+/* A previously requested replenishment has occurred. */
+typedef void (*on_replenishment_timer_t)  (
+        struct reservation *reservation
+);
+/* Update the reservation's budget to reflect execution or idling. */
+typedef void (*drain_budget_t) (
+        struct reservation *reservation,
+        lt_t how_much
+);
+/* Select a ready task from one of the clients for scheduling. */
+typedef struct task_struct* (*dispatch_client_t)  (
+        struct reservation *reservation,
+        lt_t *time_slice /* May be used to force rescheduling after
+                            some amount of time. 0 => no limit */
+);
+/* Destructor: called before scheduler is deactivated. */
+typedef void (*shutdown_t)(struct reservation *reservation);
+struct reservation_ops {
+        dispatch_client_t dispatch_client;
+        client_arrives_t client_arrives;
+        client_departs_t client_departs;
+        on_replenishment_timer_t replenish;
+        drain_budget_t drain_budget;
+        shutdown_t shutdown;
+};
+#define RESERVATION_BACKGROUND_PRIORITY ULLONG_MAX
+struct reservation {
+        /* used to queue in environment */
+        struct list_head list;
+        struct list_head replenish_list;
+        reservation_state_t state;
+        unsigned int id;
+        unsigned int kind;
+        /* exact meaning defined by impl. */
+        lt_t priority;
+        lt_t cur_budget;
+        lt_t next_replenishment;
+        /* budget stats */
+        lt_t budget_consumed; /* how much budget consumed in this allocation cycle? */
+        lt_t budget_consumed_total;
+        /* list of registered budget callbacks */
+        struct budget_notifier_list budget_notifiers;
+        /* for memory reclamation purposes */
+        struct list_head all_list;
+        /* interaction with framework */
+        struct reservation_environment *env;
+        struct reservation_ops *ops;
+        struct list_head clients;
+};
+void reservation_init(struct reservation *res);
+/* Default implementations */
+/* simply select the first client in the list, set *for_at_most to zero */
+struct task_struct* default_dispatch_client(
+        struct reservation *res,
+        lt_t *for_at_most
+);
+/* drain budget at linear rate, enter DEPLETED state when budget used up */
+void common_drain_budget(struct reservation *res, lt_t how_much);
+/* "connector" reservation client to hook up tasks with reservations */
+struct task_client {
+        struct reservation_client client;
+        struct task_struct *task;
+};
+void task_client_init(struct task_client *tc, struct task_struct *task,
+        struct reservation *reservation);
+#define SUP_RESCHEDULE_NOW (0)
+#define SUP_NO_SCHEDULER_UPDATE (ULLONG_MAX)
+/* A simple uniprocessor (SUP) flat (i.e., non-hierarchical) reservation
+ * environment.
+ */
+struct sup_reservation_environment {
+        struct reservation_environment env;
+        /* ordered by priority */
+        struct list_head active_reservations;
+        /* ordered by next_replenishment */
+        struct list_head depleted_reservations;
+        /* unordered */
+        struct list_head inactive_reservations;
+        /* list of all reservations */
+        struct list_head all_reservations;
+        /* - SUP_RESCHEDULE_NOW means call sup_dispatch() now
+         * - SUP_NO_SCHEDULER_UPDATE means nothing to do
+         * any other value means program a timer for the given time
+         */
+        lt_t next_scheduler_update;
+        /* set to true if a call to sup_dispatch() is imminent */
+        bool will_schedule;
+};
+/* Contract:
+ *  - before calling into sup_ code, or any reservation methods,
+ *    update the time with sup_update_time(); and
+ *  - after calling into sup_ code, or any reservation methods,
+ *    check next_scheduler_update and program timer or trigger
+ *    scheduler invocation accordingly.
+ */
+void sup_init(struct sup_reservation_environment* sup_env);
+void sup_add_new_reservation(struct sup_reservation_environment* sup_env,
+        struct reservation* new_res);
+void sup_update_time(struct sup_reservation_environment* sup_env, lt_t now);
+struct task_struct* sup_dispatch(struct sup_reservation_environment* sup_env);
+struct reservation* sup_find_by_id(struct sup_reservation_environment* sup_env,
+        unsigned int id);
+#endif
diff --git a/include/litmus/reservations/table-driven.h b/include/litmus/reservations/table-driven.h
new file mode 100644
index 000000000000..b6302a2f200d
--- /dev/null
+++ b/include/litmus/reservations/table-driven.h
@@ -0,0 +1,23 @@
+#ifndef LITMUS_RESERVATIONS_TABLE_DRIVEN_H
+#define LITMUS_RESERVATIONS_TABLE_DRIVEN_H
+#include <litmus/reservations/reservation.h>
+struct table_driven_reservation {
+        /* extend basic reservation */
+        struct reservation res;
+        lt_t major_cycle;
+        unsigned int next_interval;
+        unsigned int num_intervals;
+        struct lt_interval *intervals;
+        /* info about current scheduling slot */
+        struct lt_interval cur_interval;
+        lt_t major_cycle_start;
+};
+void table_driven_reservation_init(struct table_driven_reservation *tdres,
+        lt_t major_cycle, struct lt_interval *intervals, unsigned int num_intervals);
+#endif
diff --git a/include/litmus/rt_domain.h b/include/litmus/rt_domain.h
new file mode 100644
index 000000000000..ac249292e866
--- /dev/null
+++ b/include/litmus/rt_domain.h
@@ -0,0 +1,182 @@
+/* CLEANUP: Add comments and make it less messy.
+ *
+ */
+#ifndef __UNC_RT_DOMAIN_H__
+#define __UNC_RT_DOMAIN_H__
+#include <litmus/bheap.h>
+#define RELEASE_QUEUE_SLOTS 127 /* prime */
+struct _rt_domain;
+typedef int (*check_resched_needed_t)(struct _rt_domain *rt);
+typedef void (*release_jobs_t)(struct _rt_domain *rt, struct bheap* tasks);
+struct release_queue {
+        /* each slot maintains a list of release heaps sorted
+         * by release time */
+        struct list_head                slot[RELEASE_QUEUE_SLOTS];
+};
+typedef struct _rt_domain {
+        /* runnable rt tasks are in here */
+        raw_spinlock_t                  ready_lock;
+        struct bheap                    ready_queue;
+        /* real-time tasks waiting for release are in here */
+        raw_spinlock_t                  release_lock;
+        struct release_queue            release_queue;
+#ifdef CONFIG_RELEASE_MASTER
+        int                             release_master;
+#endif
+        /* for moving tasks to the release queue */
+        raw_spinlock_t                  tobe_lock;
+        struct list_head                tobe_released;
+        /* how do we check if we need to kick another CPU? */
+        check_resched_needed_t          check_resched;
+        /* how do we release jobs? */
+        release_jobs_t                  release_jobs;
+        /* how are tasks ordered in the ready queue? */
+        bheap_prio_t                    order;
+} rt_domain_t;
+struct release_heap {
+        /* list_head for per-time-slot list */
+        struct list_head                list;
+        lt_t                            release_time;
+        /* all tasks to be released at release_time */
+        struct bheap                    heap;
+        /* used to trigger the release */
+        struct hrtimer                  timer;
+#ifdef CONFIG_RELEASE_MASTER
+        /* used to delegate releases */
+        struct hrtimer_start_on_info    info;
+#endif
+        /* required for the timer callback */
+        rt_domain_t*                    dom;
+};
+static inline struct task_struct* __next_ready(rt_domain_t* rt)
+{
+        struct bheap_node *hn = bheap_peek(rt->order, &rt->ready_queue);
+        if (hn)
+                return bheap2task(hn);
+        else
+                return NULL;
+}
+void rt_domain_init(rt_domain_t *rt, bheap_prio_t order,
+                    check_resched_needed_t check,
+                    release_jobs_t relase);
+void __add_ready(rt_domain_t* rt, struct task_struct *new);
+void __merge_ready(rt_domain_t* rt, struct bheap *tasks);
+void __add_release(rt_domain_t* rt, struct task_struct *task);
+static inline struct task_struct* __take_ready(rt_domain_t* rt)
+{
+        struct bheap_node* hn = bheap_take(rt->order, &rt->ready_queue);
+        if (hn)
+                return bheap2task(hn);
+        else
+                return NULL;
+}
+static inline struct task_struct* __peek_ready(rt_domain_t* rt)
+{
+        struct bheap_node* hn = bheap_peek(rt->order, &rt->ready_queue);
+        if (hn)
+                return bheap2task(hn);
+        else
+                return NULL;
+}
+static inline int  is_queued(struct task_struct *t)
+{
+        BUG_ON(!tsk_rt(t)->heap_node);
+        return bheap_node_in_heap(tsk_rt(t)->heap_node);
+}
+static inline void remove(rt_domain_t* rt, struct task_struct *t)
+{
+        bheap_delete(rt->order, &rt->ready_queue, tsk_rt(t)->heap_node);
+}
+static inline void add_ready(rt_domain_t* rt, struct task_struct *new)
+{
+        unsigned long flags;
+        /* first we need the write lock for rt_ready_queue */
+        raw_spin_lock_irqsave(&rt->ready_lock, flags);
+        __add_ready(rt, new);
+        raw_spin_unlock_irqrestore(&rt->ready_lock, flags);
+}
+static inline void merge_ready(rt_domain_t* rt, struct bheap* tasks)
+{
+        unsigned long flags;
+        raw_spin_lock_irqsave(&rt->ready_lock, flags);
+        __merge_ready(rt, tasks);
+        raw_spin_unlock_irqrestore(&rt->ready_lock, flags);
+}
+static inline struct task_struct* take_ready(rt_domain_t* rt)
+{
+        unsigned long flags;
+        struct task_struct* ret;
+        /* first we need the write lock for rt_ready_queue */
+        raw_spin_lock_irqsave(&rt->ready_lock, flags);
+        ret = __take_ready(rt);
+        raw_spin_unlock_irqrestore(&rt->ready_lock, flags);
+        return ret;
+}
+static inline void add_release(rt_domain_t* rt, struct task_struct *task)
+{
+        unsigned long flags;
+        raw_spin_lock_irqsave(&rt->tobe_lock, flags);
+        __add_release(rt, task);
+        raw_spin_unlock_irqrestore(&rt->tobe_lock, flags);
+}
+#ifdef CONFIG_RELEASE_MASTER
+void __add_release_on(rt_domain_t* rt, struct task_struct *task,
+                      int target_cpu);
+static inline void add_release_on(rt_domain_t* rt,
+                                  struct task_struct *task,
+                                  int target_cpu)
+{
+        unsigned long flags;
+        raw_spin_lock_irqsave(&rt->tobe_lock, flags);
+        __add_release_on(rt, task, target_cpu);
+        raw_spin_unlock_irqrestore(&rt->tobe_lock, flags);
+}
+#endif
+static inline int __jobs_pending(rt_domain_t* rt)
+{
+        return !bheap_empty(&rt->ready_queue);
+}
+static inline int jobs_pending(rt_domain_t* rt)
+{
+        unsigned long flags;
+        int ret;
+        /* first we need the write lock for rt_ready_queue */
+        raw_spin_lock_irqsave(&rt->ready_lock, flags);
+        ret = !bheap_empty(&rt->ready_queue);
+        raw_spin_unlock_irqrestore(&rt->ready_lock, flags);
+        return ret;
+}
+#endif
diff --git a/include/litmus/rt_param.h b/include/litmus/rt_param.h
index 8c7869f46bfb..9b291343714f 100644
--- a/include/litmus/rt_param.h
+++ b/include/litmus/rt_param.h
@@ -62,6 +62,7 @@ typedef enum {
 #define LITMUS_MAX_PRIORITY     512
 #define LITMUS_HIGHEST_PRIORITY   1
 #define LITMUS_LOWEST_PRIORITY    (LITMUS_MAX_PRIORITY - 1)
+#define LITMUS_NO_PRIORITY      UINT_MAX
 /* Provide generic comparison macros for userspace,
 * in case that we change this later. */
@@ -71,6 +72,46 @@ typedef enum {
        ((p) >= LITMUS_HIGHEST_PRIORITY &&      \
         (p) <= LITMUS_LOWEST_PRIORITY)
+/* reservation support */
+typedef enum {
+        PERIODIC_POLLING = 10,
+        SPORADIC_POLLING,
+        TABLE_DRIVEN,
+} reservation_type_t;
+struct lt_interval {
+        lt_t start;
+        lt_t end;
+};
+#ifndef __KERNEL__
+#define __user
+#endif
+struct reservation_config {
+        unsigned int id;
+        lt_t priority;
+        int  cpu;
+        union {
+                struct {
+                        lt_t period;
+                        lt_t budget;
+                        lt_t relative_deadline;
+                        lt_t offset;
+                } polling_params;
+                struct {
+                        lt_t major_cycle_length;
+                        unsigned int num_intervals;
+                        struct lt_interval __user *intervals;
+                } table_driven_params;
+        };
+};
+/* regular sporadic task support */
 struct rt_task {
        lt_t            exec_cost;
        lt_t            period;
@@ -83,54 +124,6 @@ struct rt_task {
        release_policy_t release_policy;
 };
-union np_flag {
-        uint64_t raw;
-        struct {
-                /* Is the task currently in a non-preemptive section? */
-                uint64_t flag:31;
-                /* Should the task call into the scheduler? */
-                uint64_t preempt:1;
-        } np;
-};
-/* The definition of the data that is shared between the kernel and real-time
- * tasks via a shared page (see litmus/ctrldev.c).
- *
- * WARNING: User space can write to this, so don't trust
- * the correctness of the fields!
- *
- * This servees two purposes: to enable efficient signaling
- * of non-preemptive sections (user->kernel) and
- * delayed preemptions (kernel->user), and to export
- * some real-time relevant statistics such as preemption and
- * migration data to user space. We can't use a device to export
- * statistics because we want to avoid system call overhead when
- * determining preemption/migration overheads).
- */
-struct control_page {
-        /* This flag is used by userspace to communicate non-preempive
-         * sections. */
-        volatile union np_flag sched;
-        volatile uint64_t irq_count; /* Incremented by the kernel each time an IRQ is
-                                      * handled. */
-        /* Locking overhead tracing: userspace records here the time stamp
-         * and IRQ counter prior to starting the system call. */
-        uint64_t ts_syscall_start;  /* Feather-Trace cycles */
-        uint64_t irq_syscall_start; /* Snapshot of irq_count when the syscall
-                                     * started. */
-        /* to be extended */
-};
-/* Expected offsets within the control page. */
-#define LITMUS_CP_OFFSET_SCHED          0
-#define LITMUS_CP_OFFSET_IRQ_COUNT      8
-#define LITMUS_CP_OFFSET_TS_SC_START    16
-#define LITMUS_CP_OFFSET_IRQ_SC_START   24
 /* don't export internal data structures to user space (liblitmus) */
 #ifdef __KERNEL__
@@ -177,9 +170,6 @@ struct pfair_param;
 *      be explicitly set up before the task set is launched.
 */
 struct rt_param {
-        /* Generic flags available for plugin-internal use. */
-        unsigned int            flags:8;
        /* do we need to check for srp blocking? */
        unsigned int            srp_non_recurse:1;
@@ -207,13 +197,19 @@ struct rt_param {
        /* timing parameters */
        struct rt_job           job_params;
+        /* Special handling for periodic tasks executing
+         * clock_nanosleep(CLOCK_MONOTONIC, ...).
+         */
+        lt_t                    nanosleep_wakeup;
+        unsigned int    doing_abs_nanosleep:1;
        /* Should the next job be released at some time other than
         * just period time units after the last release?
         */
        unsigned int            sporadic_release:1;
        lt_t                    sporadic_release_time;
        /* task representing the current "inherited" task
         * priority, assigned by inherit_priority and
         * return priority in the scheduler plugins.
@@ -255,7 +251,10 @@ struct rt_param {
        volatile int            linked_on;
        /* PFAIR/PD^2 state. Allocated on demand. */
-        struct pfair_param*     pfair;
+        union {
+                void *plugin_state;
+                struct pfair_param *pfair;
+        };
        /* Fields saved before BE->RT transition.
         */
diff --git a/include/litmus/sched_plugin.h b/include/litmus/sched_plugin.h
new file mode 100644
index 000000000000..0923f26b745a
--- /dev/null
+++ b/include/litmus/sched_plugin.h
@@ -0,0 +1,180 @@
+/*
+ * Definition of the scheduler plugin interface.
+ *
+ */
+#ifndef _LINUX_SCHED_PLUGIN_H_
+#define _LINUX_SCHED_PLUGIN_H_
+#include <linux/sched.h>
+#ifdef CONFIG_LITMUS_LOCKING
+#include <litmus/locking.h>
+#endif
+/************************ setup/tear down ********************/
+typedef long (*activate_plugin_t) (void);
+typedef long (*deactivate_plugin_t) (void);
+struct domain_proc_info;
+typedef long (*get_domain_proc_info_t) (struct domain_proc_info **info);
+/********************* scheduler invocation ******************/
+/* The main scheduling function, called to select the next task to dispatch. */
+typedef struct task_struct* (*schedule_t)(struct task_struct * prev);
+/* Clean up after the task switch has occured.
+ * This function is called after every (even non-rt) task switch.
+ */
+typedef void (*finish_switch_t)(struct task_struct *prev);
+/* When waiting for the stack of the task selected by the plugin
+ * to become available, this callback is invoked to give the
+ * plugin a chance to cancel the wait. If the plugin returns false,
+ * the scheduler is invoked again. */
+typedef bool (*should_wait_for_stack_t)(struct task_struct *next);
+/* After a pull migration (which involves dropping scheduler locks),
+ * the plugin is given the chance to validate that the task is still
+ * the right one. If the plugin returns false, the scheduler
+ * will be invoked again. */
+typedef bool (*post_migration_validate_t)(struct task_struct *next);
+/* After dropping the lock to facilitate a pull migration, the task
+ * state may have changed. In this case, the core notifies the plugin
+ * with this callback and then invokes the scheduler again. */
+typedef void (*next_became_invalid_t)(struct task_struct *next);
+/********************* task state changes ********************/
+/* Called to setup a new real-time task.
+ * Release the first job, enqueue, etc.
+ * Task may already be running.
+ */
+typedef void (*task_new_t) (struct task_struct *task,
+                            int on_rq,
+                            int running);
+/* Called when userspace seeks to set new task parameters for a task
+ * that is already in real-time mode (i.e., is_realtime(task)).
+ */
+typedef long (*task_change_params_t) (struct task_struct *task,
+                            struct rt_task *new_params);
+/* Called to re-introduce a task after blocking.
+ * Can potentially be called multiple times.
+ */
+typedef void (*task_wake_up_t) (struct task_struct *task);
+/* called to notify the plugin of a blocking real-time task
+ * it will only be called for real-time tasks and before schedule is called */
+typedef void (*task_block_t)  (struct task_struct *task);
+/* Called when a real-time task exits or changes to a different scheduling
+ * class.
+ * Free any allocated resources
+ */
+typedef void (*task_exit_t)    (struct task_struct *);
+/* task_exit() is called with interrupts disabled and runqueue locks held, and
+ * thus and cannot block or spin.  task_cleanup() is called sometime later
+ * without any locks being held.
+ */
+typedef void (*task_cleanup_t)  (struct task_struct *);
+#ifdef CONFIG_LITMUS_LOCKING
+/* Called when the current task attempts to create a new lock of a given
+ * protocol type. */
+typedef long (*allocate_lock_t) (struct litmus_lock **lock, int type,
+                                 void* __user config);
+#endif
+/********************* sys call backends  ********************/
+/* This function causes the caller to sleep until the next release */
+typedef long (*complete_job_t) (void);
+typedef long (*admit_task_t)(struct task_struct* tsk);
+/* return false to indicate that the plugin does not support forking */
+typedef bool (*fork_task_t)(struct task_struct* tsk);
+typedef long (*wait_for_release_at_t)(lt_t release_time);
+/* Informs the plugin when a synchronous release takes place. */
+typedef void (*synchronous_release_at_t)(lt_t time_zero);
+/* How much budget has the current task consumed so far, and how much
+ * has it left? The default implementation ties into the per-task
+ * budget enforcement code. Plugins can override this to report
+ * reservation-specific values. */
+typedef void (*current_budget_t)(lt_t *used_so_far, lt_t *remaining);
+/* Reservation creation/removal backends. Meaning of reservation_type and
+ * reservation_id are entirely plugin-specific. */
+typedef long (*reservation_create_t)(int reservation_type, void* __user config);
+typedef long (*reservation_destroy_t)(unsigned int reservation_id, int cpu);
+/************************ misc routines ***********************/
+struct sched_plugin {
+        struct list_head        list;
+        /*      basic info              */
+        char                    *plugin_name;
+        /*      setup                   */
+        activate_plugin_t       activate_plugin;
+        deactivate_plugin_t     deactivate_plugin;
+        get_domain_proc_info_t  get_domain_proc_info;
+        /*      scheduler invocation    */
+        schedule_t              schedule;
+        finish_switch_t         finish_switch;
+        /* control over pull migrations */
+        should_wait_for_stack_t should_wait_for_stack;
+        next_became_invalid_t next_became_invalid;
+        post_migration_validate_t post_migration_validate;
+        /*      syscall backend         */
+        complete_job_t          complete_job;
+        wait_for_release_at_t   wait_for_release_at;
+        synchronous_release_at_t synchronous_release_at;
+        /*      task state changes      */
+        admit_task_t            admit_task;
+        fork_task_t                     fork_task;
+    task_new_t                  task_new;
+        task_wake_up_t          task_wake_up;
+        task_block_t            task_block;
+        /* optional: support task parameter changes at runtime */
+    task_change_params_t task_change_params;
+        task_exit_t             task_exit;
+        task_cleanup_t          task_cleanup;
+        current_budget_t        current_budget;
+        /* Reservation support */
+        reservation_create_t    reservation_create;
+        reservation_destroy_t   reservation_destroy;
+#ifdef CONFIG_LITMUS_LOCKING
+        /*      locking protocols       */
+        allocate_lock_t         allocate_lock;
+#endif
+} __attribute__ ((__aligned__(SMP_CACHE_BYTES)));
+extern struct sched_plugin *litmus;
+int register_sched_plugin(struct sched_plugin* plugin);
+struct sched_plugin* find_sched_plugin(const char* name);
+void print_sched_plugins(struct seq_file *m);
+extern struct sched_plugin linux_sched_plugin;
+#endif
diff --git a/include/litmus/srp.h b/include/litmus/srp.h
new file mode 100644
index 000000000000..c9a4552b2bf3
--- /dev/null
+++ b/include/litmus/srp.h
@@ -0,0 +1,28 @@
+#ifndef LITMUS_SRP_H
+#define LITMUS_SRP_H
+struct srp_semaphore;
+struct srp_priority {
+        struct list_head        list;
+        unsigned int            priority;
+        pid_t                   pid;
+};
+#define list2prio(l) list_entry(l, struct srp_priority, list)
+/* struct for uniprocessor SRP "semaphore" */
+struct srp_semaphore {
+        struct litmus_lock litmus_lock;
+        struct srp_priority ceiling;
+        struct task_struct* owner;
+        int cpu; /* cpu associated with this "semaphore" and resource */
+};
+/* map a task to its SRP preemption level priority */
+typedef unsigned int (*srp_prioritization_t)(struct task_struct* t);
+/* Must be updated by each plugin that uses SRP.*/
+extern srp_prioritization_t get_srp_prio;
+struct srp_semaphore* allocate_srp_semaphore(void);
+#endif
diff --git a/include/litmus/wait.h b/include/litmus/wait.h
new file mode 100644
index 000000000000..ce1347c355f8
--- /dev/null
+++ b/include/litmus/wait.h
@@ -0,0 +1,57 @@
+#ifndef _LITMUS_WAIT_H_
+#define _LITMUS_WAIT_H_
+struct task_struct* __waitqueue_remove_first(wait_queue_head_t *wq);
+/* wrap regular wait_queue_t head */
+struct __prio_wait_queue {
+        wait_queue_t wq;
+        /* some priority point */
+        lt_t priority;
+        /* break ties in priority by lower tie_breaker */
+        unsigned int tie_breaker;
+};
+typedef struct __prio_wait_queue prio_wait_queue_t;
+static inline void init_prio_waitqueue_entry(prio_wait_queue_t *pwq,
+                                             struct task_struct* t,
+                                             lt_t priority)
+{
+        init_waitqueue_entry(&pwq->wq, t);
+        pwq->priority    = priority;
+        pwq->tie_breaker = 0;
+}
+static inline void init_prio_waitqueue_entry_tie(prio_wait_queue_t *pwq,
+                                                 struct task_struct* t,
+                                                 lt_t priority,
+                                                 unsigned int tie_breaker)
+{
+        init_waitqueue_entry(&pwq->wq, t);
+        pwq->priority    = priority;
+        pwq->tie_breaker = tie_breaker;
+}
+unsigned int __add_wait_queue_prio_exclusive(
+        wait_queue_head_t* head,
+        prio_wait_queue_t *new);
+static inline unsigned int add_wait_queue_prio_exclusive(
+        wait_queue_head_t* head,
+        prio_wait_queue_t *new)
+{
+        unsigned long flags;
+        unsigned int passed;
+        spin_lock_irqsave(&head->lock, flags);
+        passed = __add_wait_queue_prio_exclusive(head, new);
+        spin_unlock_irqrestore(&head->lock, flags);
+        return passed;
+}
+#endif
author	Bjoern Brandenburg <bbb@mpi-sws.org>	2015-08-09 07:18:48 -0400
committer	Bjoern Brandenburg <bbb@mpi-sws.org>	2017-05-26 17:12:28 -0400
commit	3baa55c19ffb567aa48568fa69dd17ad6f70d31d (patch)
tree	7e79fd398705929f2db40ba239895cc60762f61f /include/litmus
parent	cbe61859a233702ed8e6723b3b133d1f2ae1ae2c (diff)