From 4f97e3e478b4b248d993bce56c1c6bb737decbbe Mon Sep 17 00:00:00 2001
From: Jonathan Herman <hermanjl@cs.unc.edu>
Date: Tue, 19 Mar 2013 15:06:36 -0400
Subject: Added README.

---
 README | 507 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 507 insertions(+)
 create mode 100644 README

diff --git a/README b/README
new file mode 100644
index 0000000..e1a0815
--- /dev/null
+++ b/README
@@ -0,0 +1,507 @@
+I. INTRODUCTION
+These scripts provide a common way for creating, running, parsing, and
+plotting experiments under LITMUS^RT. They are designed with the
+following principles in mind:
+
+1. Little or no configuration: all scripts use certain parameters to
+configure behavior. However, if the user does not give these
+parameters, the scripts will examine the properties of the user's
+system to pick a suitable default. Requiring user input is a last
+resort.
+
+2. Interruptability: the scripts save their work as they evaluate
+multiple directories. When the scripts are interrupted, or if new data
+is added to those directories, the scripts can be re-run and they will
+resume where they left off. This vastly decreases turnaround time for
+testing new features.
+
+3. Maximum Safety: where possible, scripts save metadata in their output
+directories about the data contained. This metadata can be used by
+the other scripts to safely use the data later.
+
+4. Independence / legacy support: none of these scripts assume their
+input was generated by another of these scripts. Three are designed to
+recognize generic input formats inspired by past LITMUS^RT
+experimental setups. (The exception to this is gen_exps.py, which
+has only user intput and creates output only for run_exps.py)
+
+5. Save everything: all output and parameters (even from subprocesses)
+is saved for debugging / reproducability. This data is saved in tmp/
+directories while scripts are running in case scripts fail.
+
+These scripts require that the following repos are in the user's PATH:
+1. liblitmus - for real-time executable simulation and task set release
+2. feather-trace-tools - for recording and parsing overheads and
+   scheduling events
+
+Optionally, additional features will be enabled if these repos are
+present in the PATH:
+1. rt-kernelshark - to record ftrace events for kernelshark visualization
+2. sched_trace - to output a file containing scheduling events as
+strings
+
+Each of these scripts is designed to operate independently of the
+others. For example, the parse_exps.py will find any feather trace
+files resembling ft-xyz.bin or xyz.ft and print out overhead
+statistics for the records inside. However, the scripts provide the
+most features (especially safety) when their results are chained
+together, like so:
+
+gen_exps.py --> [exps/*] --> run_exps.py  --> [run-data/*] --.
+.------------------------------------------------------------'
+'--> parse_exps.py --> [parse-data/*] --> plot_exps.py --> [plot-data/*.pdf]
+
+0. Create experiments with gen_exps.py or some other script.
+1. Run experiments using run_exps.py, generating binary files in run-data/.
+2. Parse binary data in run-data using parse_exps.py, generating csv
+   files in parse-data/.
+3. Plot parse-data using plot_exps.py, generating pdfs in plot-data.
+
+Each of these scripts will be described. The run_exps.py script is
+first because gen_exps.py creates schedule files which depend on run_exps.py.
+
+
+II. RUN_EXPS
+Usage: run_exps.py [OPTIONS] [SCHED_FILE]... [SCHED_DIR]...
+       where a SCHED_DIR resembles:
+         SCHED_DIR/
+            SCHED_FILE
+            PARAM_FILE
+
+Output: OUT_DIR/[files] or OUT_DIR/SCHED_DIR/[files] or
+        OUT_DIR/SCHED_FILE/[files] depending on input
+        If all features are enabled, these files are:
+        OUT_DIR/[.*/]
+           trace.slog    # LITMUS logging
+           st-[1..m].bin # sched_trace data
+	   ft.bin        # feather-trace overhead data
+	   trace.dat     # ftrace data for kernelshark
+           params.py     # Schedule parameters
+	   exec-out.txt  # Standard out from schedule processes
+           exec-err.txt  # Standard err '''
+
+Defaults: SCHED_FILE = sched.py, PARAM_FILE = params.py,
+          DURATION = 30, OUT_DIR = run-data/
+
+The run_exps.py script reads schedule files and executes real-time
+task systems, recording all overhead, logging, and trace data which is
+enabled in the system. For example, if trace logging is enabled,
+rt-kernelshark is found in the path, but feather-trace is disabled
+(the devices are not present), only trace-logs and kernelshark logs
+will be recorded.
+
+When run_exps.py is running a schedule file, temporary data is saved
+in a 'tmp' directory in the same directory as the schedule file. When
+execution completes, this data is moved into a directory under the
+run_exps.py output directory (default: 'run-data/', can be changed with
+the -o option). When multiple schedules are run, each schedule's data
+is saved in a unique directory under the output directory.
+
+If a schedule has been run and it's data is in the output directory,
+run_exps.py will not re-run the schedule unless the -f option is
+specified. This is useful if your system crashes midway through a set
+of experiments.
+
+Schedule files have one of the following two formats:
+
+a) simple format
+       path/to/proc{proc_value}
+       ...
+       path/to/proc{proc_value}
+       [real_time_task: default rtspin] task_arguments...
+       ...
+       [real_time_task] task_arguments...
+
+b) python format
+       {'proc':[
+           ('path/to/proc','proc_value'),
+            ...,
+           ('path/to/proc','proc_value')
+        ],
+        'spin':[
+           ('real_time_task', 'task_arguments'),
+            ...
+           ('real_time_task', 'task_arguments')
+        ]
+       }
+
+The following creates a simple 3-task system with utilization 2.0,
+which is then run under the GSN-EDF plugin:
+
+$ echo "10 20
+30 40
+60 90" > test.sched
+$ run_exps.py -s GSN-EDF test.sched
+
+The following will write a release master using
+/proc/litmus/release_master:
+
+$ echo "release_master{2}
+10 20" > test.sched && run_exps.py -s GSN-EDF test.sched
+
+A longer form can be used for proc entries not in /proc/litmus:
+
+$ echo "/proc/sys/something{hello}"
+10 20" > test.sched
+
+You can specify your own spin programs to run as well instead of
+rtspin by putting their name at the beginning of the line.
+
+$ echo "colorspin -f color1.csv 10 20" > test.sched
+
+This example also shows how you can reference files in the same
+directory as the schedule file on the command line.
+
+You can specify parameters for an experiment in a file instead of on
+the command line using params.py (the -p option lets you choose the
+name of this file if params.py is not for you):
+
+$ echo "{'scheduler':'GSN-EDF', 'duration':10}" > params.py
+$ run_exps.py test.sched
+
+You can also run multiple experiments with a single command, provided
+a directory with a schedule file exists for each. By default, the
+program will look for sched.py for the schedule file and params.py for
+the parameter file, but this behavior can be changed using the -p and
+-c options.
+
+You can include non-relevant parameters which run_exps.py does not
+understand in params.py. These parameters will be saved with the data
+output by run_exps.py. This is useful for tracking variations in
+system parameters versus experimental results.
+
+In the following example, multiple experiments are demonstrated and an
+extra parameter 'test-param' is included:
+
+$ mkdir test1
+# The duration will default to 30 and need not be specified
+$ echo "{'scheduler':'C-EDF', 'test-param':1} > test1/params.py
+$ echo "10 20" > test1/sched.py
+$ cp -r test1 test2
+$ echo "{'scheduler':'GSN-EDF', 'test-param':2}"> test2/params.py
+$ run_exps.py test*
+
+Finally, you can specify system properties in params.py which the
+environment must match for the experiment to run. These are useful if
+you have a large batch of experiments which must be run under
+different kernels. The first property is a regular expression for the
+uname of the system:
+
+$ uname -r
+3.0.0-litmus
+$ cp params.py old_params.py
+$ echo "{'uname': r'.*linux.*'}" >> params.py
+# run_exps.py will now complain of an invalid environment for this
+experiment
+$ cp old_params.py params.py
+$ echo "{'uname': r'.*litmus.*'}" >> params.py
+# run_exps.py will now succeed
+
+The second property are kernel configuration options. These assume the
+configuration is stored at /boot/config-`uname -r`. You can specify
+these like so:
+
+$ echo "{'config-options':{
+'RELEASE_MASTER' : 'y',
+'ARM' : 'y'}}" >> params.py
+# Only executes on ARM systems with the release master enabled
+
+
+III. GEN_EXPS
+Usage: gen_exps.py [options] [files...] [generators...] [param=val[,val]...]
+Output:   exps/EXP_DIRS which each contain sched.py and params.py
+Defaults: generators = G-EDF P-EDF C-EDF
+
+The gen_exps.py script uses 'generators', one for each LITMUS
+scheduler supported, which each have different properties which can be
+varied to generate different types of schedules. Each of these
+properties has a default value which can be modified on the command
+line for quick and easy experiment generation.
+
+This script as written should be used to create debugging task sets,
+but not for creating task sets for experiments shown in papers. That
+is because the safety features of run_exps.py described above (uname,
+config-options) are not used here. If you are creating experiments for
+a paper, you should create your own generator which outputs values for
+'config-options' required for your plugin so that you cannot ruin your
+experiments at run time.
+
+The -l option lists the supported generators which can be specified:
+
+$ gen_exps.py -l
+G-EDF, P-EDF, C-EDF
+
+The -d option will describe the properties of a generator or
+generators and their default values. Note that some of these defaults
+will vary depending on the system the script is run. For example,
+'cpus' defaults to the number of cpus on the current system, in this
+example 24.
+
+$ gen_exps.py -d G-EDF,P-EDF
+Generator GSN-EDF:
+        num_tasks -- Number of tasks per experiment.
+                Default: [24, 48, 72, 96]
+                Allowed: <type 'int'>
+	....
+
+Generator PSN-EDF:
+        num_tasks -- Number of tasks per experiment.
+                Default: [24, 48, 72, 96]
+                Allowed: <type 'int'>
+        cpus -- Number of processors on target system.
+                Default: [24]
+                Allowed: <type 'int'>
+	....
+
+You create experiments by specifying a generator. The following will
+create experiments 4 schedules with 24, 48, 72, and 96 tasks, because
+the default value of num_tasks is an array of these values
+
+$ gen_exps.py P-EDF
+$ ls exps/
+sched=GSN-EDF_num-tasks=24/  sched=GSN-EDF_num-tasks=48/
+sched=GSN-EDF_num-tasks=72/  sched=GSN-EDF_num-tasks=96/
+
+You can modify the default using a single value (the -f option deletes
+previous experiments in the output directory, defaulting to 'exps/',
+changeable with -o):
+
+$ gen_exps.py -f P-EDF num_tasks=24
+$ ls exps/
+sched=GSN-EDF_num-tasks=24/
+
+Or with an array of values, specified as a comma-seperated list:
+
+$ gen_exps.py -f num_tasks=`seq -s, 24 2 30` P-EDF
+sched=PSN-EDF_num-tasks=24/  sched=PSN-EDF_num-tasks=26/
+sched=PSN-EDF_num-tasks=28/  sched=PSN-EDF_num-tasks=30/
+
+The generator will create a different directory for each possible
+configuration of the parameters. Each parameter which is varied is
+included in the name of the schedule directory. For example, to vary
+the number of CPUs but not the number of tasks:
+
+$ gen_exps.py -f num_tasks=24 cpus=3,6 P-EDF
+$ ls exps
+sched=PSN-EDF_cpus=3/  sched=PSN-EDF_cpus=6/
+
+The values of non-varying parameters are still saved in
+params.py. Continuing the example above:
+
+$ cat exps/sched\=PSN-EDF_cpus\=3/params.py
+{'periods': 'harmonic', 'release_master': False, 'duration': 30,
+ 'utils': 'uni-medium', 'scheduler': 'PSN-EDF', 'cpus': 3}
+
+You can also have multiple schedules generated with the same
+configuration using the -n option:
+
+$ gen_exps.py -f num_tasks=24 -n 5 P-EDF
+$ ls exps/
+sched=PSN-EDF_trial=0/  sched=PSN-EDF_trial=1/  sched=PSN-EDF_trial=2/
+sched=PSN-EDF_trial=3/  sched=PSN-EDF_trial=4/
+
+
+IV. PARSE_EXPS
+Usage: parse_exps.py [options] [data_dir1] [data_dir2]...
+       where data_dirs contain feather-trace and sched-trace data,
+       e.g. ft.bin, mysched.ft, or st-*.bin.
+
+Output: print out all parsed data or
+        OUT_FILE where OUT_FILE is a python map of the data or
+	OUT_DIR/[FIELD]*/[PARAM]/[TYPE]/[TYPE]/[LINE].csv
+
+	The goal is to create csv files which record how varying PARAM
+	changes the value of FIELD. Only PARAMs which vary are
+	considered.
+
+	FIELD is a parsed value, e.g. 'RELEASE' overhead or 'miss-ratio'
+	PARAM is a parameter which we are going to vary, e.g. 'tasks'
+	A single LINE is created for every configuration of parameters
+	other than PARAM.
+
+	TYPE is the type of measurement, i.e. Max, Min, Avg, or
+	Var[iance]. The two types are used to differentiate between
+	measurements across tasks in a single taskset, and
+	measurements across all tasksets. E.g. miss-ratio/*/Max/Avg
+	is the maximum of all the average miss ratios for each task set, while
+	miss-ratio/*/Avg/Max is the average of the maximum miss ratios
+	for each task set.
+
+Defaults: OUT_DIR or OUT_FILE = parse-data, data_dir1 = '.'
+
+The parse_exps.py script reads a directory or directories, parses the
+binary files inside for feather-trace or sched-trace data, then
+summarizes and organizes the results for output. The output can be to
+the console, to a python map, or to a directory tree of csvs (the
+default, ish). The python map (using -m) can be used for
+schedulability tests. The directory tree can be used to look at how
+changing parameters affects certain measurements.
+
+The script will use half the current computers CPUs to process data.
+
+In the following example, too little data was found to create csv
+files, so the data is output to the console despite the user not
+specifying the -v option. This use is the easiest for quick overhead
+evalutation and debugging. Note that for overhead measurements like
+these, parse_exps.py will use the 'clock-frequency' parameter saved in
+a params.py file by run_exps.py to calculate overhead measurements. If
+a param file is not present, as in this case, the current CPUs
+frequency will be used.
+
+$ ls run-data/
+taskset_scheduler=C-FL-split-L3_host=ludwig_n=10_idx=05_split=randsplit.ft
+$ parse_exps.py run-data/
+Loading experiments...
+Parsing data...
+ 0.00%
+Writing result...
+Too little data to make csv files.
+<ExpPoint-/home/hermanjl/tmp/run-data>
+                 CXS:  Avg:     5.053  Max:    59.925  Min:     0.241
+               SCHED:  Avg:     4.410  Max:    39.350  Min:     0.357
+                TICK:  Avg:     1.812  Max:    21.380  Min:     0.241
+
+In the next example, because the value of num-tasks varies, csvs can
+be created:
+
+$ ls run-data/
+sched=C-EDF_num-tasks=4/   sched=GSN-EDF_num-tasks=4/
+sched=C-EDF_num-tasks=8/   sched=GSN-EDF_num-tasks=8/
+sched=C-EDF_num-tasks=12/  sched=GSN-EDF_num-tasks=12/
+sched=C-EDF_num-tasks=16/  sched=GSN-EDF_num-tasks=16/
+$ parse_exps.py run-data/*
+$ ls parse-data/
+avg-block/  avg-tard/  max-block/  max-tard/  miss-ratio/
+
+The varying parameters were found by reading the params.py files under
+each run-data subdirectory.
+
+You can use the -v option to print out the values measured even when
+csvs could be created.
+
+You can use the -i option to ignore variations in a certain parameter
+(or parameters if a comma-seperated list is given). In the following
+example, the user has decided the 'option' does not matter after
+viewing output. Note that the 'trial' parameter, used by gen_exps.py
+to create multiple schedules with the same configuration, is always
+ignored.
+
+$ ls run-data/
+sched=C-EDF_num-tasks=4_option=1/ sched=C-EDF_num-tasks=4_option=2/
+sched=C-EDF_num-tasks=8_option=1/ sched=C-EDF_num-tasks=8_option=2/
+$ parse_exps.py run-data/*
+$ for i in `ls parse-data/miss-ratio/tasks/Avg/Avg/`; do echo $i; cat
+$i; done
+option=1.csv
+ 4 .1
+ 8 .2
+option=2.csv
+ 4 .2
+ 8 .4
+# Now ignore 'option' for more accurate results
+$ parse_exps.py -i option run-data/*
+$ for i in `ls parse-data/miss-ratio/tasks/Avg/Avg/`; do echo $i; cat
+$i; done
+line.csv
+ 4 .2
+ 8 .3
+
+The second command will also have run faster than the first. This is
+because parse_exps.py will save the data it parses in tmp/ directories
+before it attempts to sort it into csvs. Parsing takes far longer than
+sorting, so this saves a lot of time. The -f flag can be used to
+re-parse files and overwrite this saved data.
+
+All output from the feather-trace-tool programs used to parse data is
+stored in the tmp/ directories created in the input directories.  If
+the sched_trace repo is found in the users PATH, st_show will be used
+to create a human-readable version of the sched-trace data which will
+also be stored there.
+
+
+V. PLOT_EXPS
+Usage: plot_exps.py [options] [csv_dir]...
+       where a csv dir is a directory or directory of directories (and
+       so on) containing csvs, like:
+       csv_dir/[subdirs/...]
+           line1.csv
+	   line2.csv
+	   line3.csv
+
+Outputs: OUT_DIR/[csv_dir/]*[plot]*.pdf
+	 where a single plot exists for each directory of csvs, with a
+	 line for for each csv file in that directory. If only a
+	 single csv_dir is specified, all plots are placed directly
+	 under OUT_DIR.
+
+Defaults: OUT_DIR = 'plot-data/', csv_dir = '.'
+
+The plot_exps.py script takes directories of csvs (or directories
+formatted as specified below) and creates a pdf plot of each csv
+directory found. A line is created for each .csv file contained in a
+plot. Matplotlib is used to do the plotting.
+
+If the csv files are formatted like:
+
+  param=value_param2=value2.csv
+
+the variation of these parameters will be used to color the lines in
+the most readable way. For instance, if there are three parameters,
+variations in one parameter will change line color, another line
+style (dashes/dots/etc), and a third line markers
+(trianges/circles/etc).
+
+If a directory of directories is passed in, the script will assume the
+top level directory is the measured value and the next level is the
+variable, ie:
+
+  value/variable/[..../]line.csv
+
+And put a title on the plot of "Value by variable (...)". Otherwise,
+the name of the top level directory will be the title, like "Value".
+
+A directory with some lines:
+$ ls
+line1.csv line2.csv
+$ plot_exps.py
+$ ls plot-data/
+plot.pdf
+
+A directory with a few subdirectories:
+$ ls test/
+apples/ oranges/
+$ ls test/apples/
+line1.csv line2.csv
+$ plot_exps.py test/
+$ ls plot-data/
+apples.pdf oranges.pdf
+
+A directory with many subdirectories:
+$ ls parse-data
+avg-block/  avg-tard/  max-block/  max-tard/  miss-ratio/
+$ ls parse-data/avg-block/tasks/Avg/Avg
+scheduler=C-EDF.csv scheduler=PSN-EDF.csv
+$ plot_exps.py parse-data
+$ ls plot-data
+avg-block_tasks_Avg_Avg.pdf avg-block_tasks_Avg_Max.pdf avg-block_tasks_Avg_Min.pdf
+avg-block_tasks_Max_Avg.pdf avg-block_tasks_Max_Max.pdf avg-block_tasks_Max_Min.pdf
+avg-block_tasks_Min_Avg.pdf avg-block_tasks_Min_Max.pdf avg-block_tasks_Min_Min.pdf
+avg-block_tasks_Var_Avg.pdf avg-block_tasks_Var_Max.pdf avg-block_tasks_Var_Min.pdf
+.......
+
+If you run the previous example directly on the subdirectories,
+subdirectories will be created in the output:
+
+$ plot_exps.py parse-data/*
+$ ls plot-data/
+avg-block/  max-tard/ avg-tard/ miss-ratio/ max-block/
+$ ls plot-data/avg-block/
+tasks_Avg_Avg.pdf  tasks_Avg_Min.pdf  tasks_Max_Max.pdf
+tasks_Min_Avg.pdf  tasks_Min_Min.pdf  tasks_Var_Max.pdf
+tasks_Avg_Max.pdf  tasks_Max_Avg.pdf  tasks_Max_Min.pdf
+tasks_Min_Max.pdf  tasks_Var_Avg.pdf  tasks_Var_Min.pdf
+
+However, when a single directory of directories is given, the script
+assumes the experiments are related and can make line styles match in
+different plots and more effectively parallelize the plotting.
+
-- 
cgit v1.2.2