From 4f97e3e478b4b248d993bce56c1c6bb737decbbe Mon Sep 17 00:00:00 2001 From: Jonathan Herman Date: Tue, 19 Mar 2013 15:06:36 -0400 Subject: Added README. --- README | 507 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 507 insertions(+) create mode 100644 README diff --git a/README b/README new file mode 100644 index 0000000..e1a0815 --- /dev/null +++ b/README @@ -0,0 +1,507 @@ +I. INTRODUCTION +These scripts provide a common way for creating, running, parsing, and +plotting experiments under LITMUS^RT. They are designed with the +following principles in mind: + +1. Little or no configuration: all scripts use certain parameters to +configure behavior. However, if the user does not give these +parameters, the scripts will examine the properties of the user's +system to pick a suitable default. Requiring user input is a last +resort. + +2. Interruptability: the scripts save their work as they evaluate +multiple directories. When the scripts are interrupted, or if new data +is added to those directories, the scripts can be re-run and they will +resume where they left off. This vastly decreases turnaround time for +testing new features. + +3. Maximum Safety: where possible, scripts save metadata in their output +directories about the data contained. This metadata can be used by +the other scripts to safely use the data later. + +4. Independence / legacy support: none of these scripts assume their +input was generated by another of these scripts. Three are designed to +recognize generic input formats inspired by past LITMUS^RT +experimental setups. (The exception to this is gen_exps.py, which +has only user intput and creates output only for run_exps.py) + +5. Save everything: all output and parameters (even from subprocesses) +is saved for debugging / reproducability. This data is saved in tmp/ +directories while scripts are running in case scripts fail. + +These scripts require that the following repos are in the user's PATH: +1. liblitmus - for real-time executable simulation and task set release +2. feather-trace-tools - for recording and parsing overheads and + scheduling events + +Optionally, additional features will be enabled if these repos are +present in the PATH: +1. rt-kernelshark - to record ftrace events for kernelshark visualization +2. sched_trace - to output a file containing scheduling events as +strings + +Each of these scripts is designed to operate independently of the +others. For example, the parse_exps.py will find any feather trace +files resembling ft-xyz.bin or xyz.ft and print out overhead +statistics for the records inside. However, the scripts provide the +most features (especially safety) when their results are chained +together, like so: + +gen_exps.py --> [exps/*] --> run_exps.py --> [run-data/*] --. +.------------------------------------------------------------' +'--> parse_exps.py --> [parse-data/*] --> plot_exps.py --> [plot-data/*.pdf] + +0. Create experiments with gen_exps.py or some other script. +1. Run experiments using run_exps.py, generating binary files in run-data/. +2. Parse binary data in run-data using parse_exps.py, generating csv + files in parse-data/. +3. Plot parse-data using plot_exps.py, generating pdfs in plot-data. + +Each of these scripts will be described. The run_exps.py script is +first because gen_exps.py creates schedule files which depend on run_exps.py. + + +II. RUN_EXPS +Usage: run_exps.py [OPTIONS] [SCHED_FILE]... [SCHED_DIR]... + where a SCHED_DIR resembles: + SCHED_DIR/ + SCHED_FILE + PARAM_FILE + +Output: OUT_DIR/[files] or OUT_DIR/SCHED_DIR/[files] or + OUT_DIR/SCHED_FILE/[files] depending on input + If all features are enabled, these files are: + OUT_DIR/[.*/] + trace.slog # LITMUS logging + st-[1..m].bin # sched_trace data + ft.bin # feather-trace overhead data + trace.dat # ftrace data for kernelshark + params.py # Schedule parameters + exec-out.txt # Standard out from schedule processes + exec-err.txt # Standard err ''' + +Defaults: SCHED_FILE = sched.py, PARAM_FILE = params.py, + DURATION = 30, OUT_DIR = run-data/ + +The run_exps.py script reads schedule files and executes real-time +task systems, recording all overhead, logging, and trace data which is +enabled in the system. For example, if trace logging is enabled, +rt-kernelshark is found in the path, but feather-trace is disabled +(the devices are not present), only trace-logs and kernelshark logs +will be recorded. + +When run_exps.py is running a schedule file, temporary data is saved +in a 'tmp' directory in the same directory as the schedule file. When +execution completes, this data is moved into a directory under the +run_exps.py output directory (default: 'run-data/', can be changed with +the -o option). When multiple schedules are run, each schedule's data +is saved in a unique directory under the output directory. + +If a schedule has been run and it's data is in the output directory, +run_exps.py will not re-run the schedule unless the -f option is +specified. This is useful if your system crashes midway through a set +of experiments. + +Schedule files have one of the following two formats: + +a) simple format + path/to/proc{proc_value} + ... + path/to/proc{proc_value} + [real_time_task: default rtspin] task_arguments... + ... + [real_time_task] task_arguments... + +b) python format + {'proc':[ + ('path/to/proc','proc_value'), + ..., + ('path/to/proc','proc_value') + ], + 'spin':[ + ('real_time_task', 'task_arguments'), + ... + ('real_time_task', 'task_arguments') + ] + } + +The following creates a simple 3-task system with utilization 2.0, +which is then run under the GSN-EDF plugin: + +$ echo "10 20 +30 40 +60 90" > test.sched +$ run_exps.py -s GSN-EDF test.sched + +The following will write a release master using +/proc/litmus/release_master: + +$ echo "release_master{2} +10 20" > test.sched && run_exps.py -s GSN-EDF test.sched + +A longer form can be used for proc entries not in /proc/litmus: + +$ echo "/proc/sys/something{hello}" +10 20" > test.sched + +You can specify your own spin programs to run as well instead of +rtspin by putting their name at the beginning of the line. + +$ echo "colorspin -f color1.csv 10 20" > test.sched + +This example also shows how you can reference files in the same +directory as the schedule file on the command line. + +You can specify parameters for an experiment in a file instead of on +the command line using params.py (the -p option lets you choose the +name of this file if params.py is not for you): + +$ echo "{'scheduler':'GSN-EDF', 'duration':10}" > params.py +$ run_exps.py test.sched + +You can also run multiple experiments with a single command, provided +a directory with a schedule file exists for each. By default, the +program will look for sched.py for the schedule file and params.py for +the parameter file, but this behavior can be changed using the -p and +-c options. + +You can include non-relevant parameters which run_exps.py does not +understand in params.py. These parameters will be saved with the data +output by run_exps.py. This is useful for tracking variations in +system parameters versus experimental results. + +In the following example, multiple experiments are demonstrated and an +extra parameter 'test-param' is included: + +$ mkdir test1 +# The duration will default to 30 and need not be specified +$ echo "{'scheduler':'C-EDF', 'test-param':1} > test1/params.py +$ echo "10 20" > test1/sched.py +$ cp -r test1 test2 +$ echo "{'scheduler':'GSN-EDF', 'test-param':2}"> test2/params.py +$ run_exps.py test* + +Finally, you can specify system properties in params.py which the +environment must match for the experiment to run. These are useful if +you have a large batch of experiments which must be run under +different kernels. The first property is a regular expression for the +uname of the system: + +$ uname -r +3.0.0-litmus +$ cp params.py old_params.py +$ echo "{'uname': r'.*linux.*'}" >> params.py +# run_exps.py will now complain of an invalid environment for this +experiment +$ cp old_params.py params.py +$ echo "{'uname': r'.*litmus.*'}" >> params.py +# run_exps.py will now succeed + +The second property are kernel configuration options. These assume the +configuration is stored at /boot/config-`uname -r`. You can specify +these like so: + +$ echo "{'config-options':{ +'RELEASE_MASTER' : 'y', +'ARM' : 'y'}}" >> params.py +# Only executes on ARM systems with the release master enabled + + +III. GEN_EXPS +Usage: gen_exps.py [options] [files...] [generators...] [param=val[,val]...] +Output: exps/EXP_DIRS which each contain sched.py and params.py +Defaults: generators = G-EDF P-EDF C-EDF + +The gen_exps.py script uses 'generators', one for each LITMUS +scheduler supported, which each have different properties which can be +varied to generate different types of schedules. Each of these +properties has a default value which can be modified on the command +line for quick and easy experiment generation. + +This script as written should be used to create debugging task sets, +but not for creating task sets for experiments shown in papers. That +is because the safety features of run_exps.py described above (uname, +config-options) are not used here. If you are creating experiments for +a paper, you should create your own generator which outputs values for +'config-options' required for your plugin so that you cannot ruin your +experiments at run time. + +The -l option lists the supported generators which can be specified: + +$ gen_exps.py -l +G-EDF, P-EDF, C-EDF + +The -d option will describe the properties of a generator or +generators and their default values. Note that some of these defaults +will vary depending on the system the script is run. For example, +'cpus' defaults to the number of cpus on the current system, in this +example 24. + +$ gen_exps.py -d G-EDF,P-EDF +Generator GSN-EDF: + num_tasks -- Number of tasks per experiment. + Default: [24, 48, 72, 96] + Allowed: + .... + +Generator PSN-EDF: + num_tasks -- Number of tasks per experiment. + Default: [24, 48, 72, 96] + Allowed: + cpus -- Number of processors on target system. + Default: [24] + Allowed: + .... + +You create experiments by specifying a generator. The following will +create experiments 4 schedules with 24, 48, 72, and 96 tasks, because +the default value of num_tasks is an array of these values + +$ gen_exps.py P-EDF +$ ls exps/ +sched=GSN-EDF_num-tasks=24/ sched=GSN-EDF_num-tasks=48/ +sched=GSN-EDF_num-tasks=72/ sched=GSN-EDF_num-tasks=96/ + +You can modify the default using a single value (the -f option deletes +previous experiments in the output directory, defaulting to 'exps/', +changeable with -o): + +$ gen_exps.py -f P-EDF num_tasks=24 +$ ls exps/ +sched=GSN-EDF_num-tasks=24/ + +Or with an array of values, specified as a comma-seperated list: + +$ gen_exps.py -f num_tasks=`seq -s, 24 2 30` P-EDF +sched=PSN-EDF_num-tasks=24/ sched=PSN-EDF_num-tasks=26/ +sched=PSN-EDF_num-tasks=28/ sched=PSN-EDF_num-tasks=30/ + +The generator will create a different directory for each possible +configuration of the parameters. Each parameter which is varied is +included in the name of the schedule directory. For example, to vary +the number of CPUs but not the number of tasks: + +$ gen_exps.py -f num_tasks=24 cpus=3,6 P-EDF +$ ls exps +sched=PSN-EDF_cpus=3/ sched=PSN-EDF_cpus=6/ + +The values of non-varying parameters are still saved in +params.py. Continuing the example above: + +$ cat exps/sched\=PSN-EDF_cpus\=3/params.py +{'periods': 'harmonic', 'release_master': False, 'duration': 30, + 'utils': 'uni-medium', 'scheduler': 'PSN-EDF', 'cpus': 3} + +You can also have multiple schedules generated with the same +configuration using the -n option: + +$ gen_exps.py -f num_tasks=24 -n 5 P-EDF +$ ls exps/ +sched=PSN-EDF_trial=0/ sched=PSN-EDF_trial=1/ sched=PSN-EDF_trial=2/ +sched=PSN-EDF_trial=3/ sched=PSN-EDF_trial=4/ + + +IV. PARSE_EXPS +Usage: parse_exps.py [options] [data_dir1] [data_dir2]... + where data_dirs contain feather-trace and sched-trace data, + e.g. ft.bin, mysched.ft, or st-*.bin. + +Output: print out all parsed data or + OUT_FILE where OUT_FILE is a python map of the data or + OUT_DIR/[FIELD]*/[PARAM]/[TYPE]/[TYPE]/[LINE].csv + + The goal is to create csv files which record how varying PARAM + changes the value of FIELD. Only PARAMs which vary are + considered. + + FIELD is a parsed value, e.g. 'RELEASE' overhead or 'miss-ratio' + PARAM is a parameter which we are going to vary, e.g. 'tasks' + A single LINE is created for every configuration of parameters + other than PARAM. + + TYPE is the type of measurement, i.e. Max, Min, Avg, or + Var[iance]. The two types are used to differentiate between + measurements across tasks in a single taskset, and + measurements across all tasksets. E.g. miss-ratio/*/Max/Avg + is the maximum of all the average miss ratios for each task set, while + miss-ratio/*/Avg/Max is the average of the maximum miss ratios + for each task set. + +Defaults: OUT_DIR or OUT_FILE = parse-data, data_dir1 = '.' + +The parse_exps.py script reads a directory or directories, parses the +binary files inside for feather-trace or sched-trace data, then +summarizes and organizes the results for output. The output can be to +the console, to a python map, or to a directory tree of csvs (the +default, ish). The python map (using -m) can be used for +schedulability tests. The directory tree can be used to look at how +changing parameters affects certain measurements. + +The script will use half the current computers CPUs to process data. + +In the following example, too little data was found to create csv +files, so the data is output to the console despite the user not +specifying the -v option. This use is the easiest for quick overhead +evalutation and debugging. Note that for overhead measurements like +these, parse_exps.py will use the 'clock-frequency' parameter saved in +a params.py file by run_exps.py to calculate overhead measurements. If +a param file is not present, as in this case, the current CPUs +frequency will be used. + +$ ls run-data/ +taskset_scheduler=C-FL-split-L3_host=ludwig_n=10_idx=05_split=randsplit.ft +$ parse_exps.py run-data/ +Loading experiments... +Parsing data... + 0.00% +Writing result... +Too little data to make csv files. + + CXS: Avg: 5.053 Max: 59.925 Min: 0.241 + SCHED: Avg: 4.410 Max: 39.350 Min: 0.357 + TICK: Avg: 1.812 Max: 21.380 Min: 0.241 + +In the next example, because the value of num-tasks varies, csvs can +be created: + +$ ls run-data/ +sched=C-EDF_num-tasks=4/ sched=GSN-EDF_num-tasks=4/ +sched=C-EDF_num-tasks=8/ sched=GSN-EDF_num-tasks=8/ +sched=C-EDF_num-tasks=12/ sched=GSN-EDF_num-tasks=12/ +sched=C-EDF_num-tasks=16/ sched=GSN-EDF_num-tasks=16/ +$ parse_exps.py run-data/* +$ ls parse-data/ +avg-block/ avg-tard/ max-block/ max-tard/ miss-ratio/ + +The varying parameters were found by reading the params.py files under +each run-data subdirectory. + +You can use the -v option to print out the values measured even when +csvs could be created. + +You can use the -i option to ignore variations in a certain parameter +(or parameters if a comma-seperated list is given). In the following +example, the user has decided the 'option' does not matter after +viewing output. Note that the 'trial' parameter, used by gen_exps.py +to create multiple schedules with the same configuration, is always +ignored. + +$ ls run-data/ +sched=C-EDF_num-tasks=4_option=1/ sched=C-EDF_num-tasks=4_option=2/ +sched=C-EDF_num-tasks=8_option=1/ sched=C-EDF_num-tasks=8_option=2/ +$ parse_exps.py run-data/* +$ for i in `ls parse-data/miss-ratio/tasks/Avg/Avg/`; do echo $i; cat +$i; done +option=1.csv + 4 .1 + 8 .2 +option=2.csv + 4 .2 + 8 .4 +# Now ignore 'option' for more accurate results +$ parse_exps.py -i option run-data/* +$ for i in `ls parse-data/miss-ratio/tasks/Avg/Avg/`; do echo $i; cat +$i; done +line.csv + 4 .2 + 8 .3 + +The second command will also have run faster than the first. This is +because parse_exps.py will save the data it parses in tmp/ directories +before it attempts to sort it into csvs. Parsing takes far longer than +sorting, so this saves a lot of time. The -f flag can be used to +re-parse files and overwrite this saved data. + +All output from the feather-trace-tool programs used to parse data is +stored in the tmp/ directories created in the input directories. If +the sched_trace repo is found in the users PATH, st_show will be used +to create a human-readable version of the sched-trace data which will +also be stored there. + + +V. PLOT_EXPS +Usage: plot_exps.py [options] [csv_dir]... + where a csv dir is a directory or directory of directories (and + so on) containing csvs, like: + csv_dir/[subdirs/...] + line1.csv + line2.csv + line3.csv + +Outputs: OUT_DIR/[csv_dir/]*[plot]*.pdf + where a single plot exists for each directory of csvs, with a + line for for each csv file in that directory. If only a + single csv_dir is specified, all plots are placed directly + under OUT_DIR. + +Defaults: OUT_DIR = 'plot-data/', csv_dir = '.' + +The plot_exps.py script takes directories of csvs (or directories +formatted as specified below) and creates a pdf plot of each csv +directory found. A line is created for each .csv file contained in a +plot. Matplotlib is used to do the plotting. + +If the csv files are formatted like: + + param=value_param2=value2.csv + +the variation of these parameters will be used to color the lines in +the most readable way. For instance, if there are three parameters, +variations in one parameter will change line color, another line +style (dashes/dots/etc), and a third line markers +(trianges/circles/etc). + +If a directory of directories is passed in, the script will assume the +top level directory is the measured value and the next level is the +variable, ie: + + value/variable/[..../]line.csv + +And put a title on the plot of "Value by variable (...)". Otherwise, +the name of the top level directory will be the title, like "Value". + +A directory with some lines: +$ ls +line1.csv line2.csv +$ plot_exps.py +$ ls plot-data/ +plot.pdf + +A directory with a few subdirectories: +$ ls test/ +apples/ oranges/ +$ ls test/apples/ +line1.csv line2.csv +$ plot_exps.py test/ +$ ls plot-data/ +apples.pdf oranges.pdf + +A directory with many subdirectories: +$ ls parse-data +avg-block/ avg-tard/ max-block/ max-tard/ miss-ratio/ +$ ls parse-data/avg-block/tasks/Avg/Avg +scheduler=C-EDF.csv scheduler=PSN-EDF.csv +$ plot_exps.py parse-data +$ ls plot-data +avg-block_tasks_Avg_Avg.pdf avg-block_tasks_Avg_Max.pdf avg-block_tasks_Avg_Min.pdf +avg-block_tasks_Max_Avg.pdf avg-block_tasks_Max_Max.pdf avg-block_tasks_Max_Min.pdf +avg-block_tasks_Min_Avg.pdf avg-block_tasks_Min_Max.pdf avg-block_tasks_Min_Min.pdf +avg-block_tasks_Var_Avg.pdf avg-block_tasks_Var_Max.pdf avg-block_tasks_Var_Min.pdf +....... + +If you run the previous example directly on the subdirectories, +subdirectories will be created in the output: + +$ plot_exps.py parse-data/* +$ ls plot-data/ +avg-block/ max-tard/ avg-tard/ miss-ratio/ max-block/ +$ ls plot-data/avg-block/ +tasks_Avg_Avg.pdf tasks_Avg_Min.pdf tasks_Max_Max.pdf +tasks_Min_Avg.pdf tasks_Min_Min.pdf tasks_Var_Max.pdf +tasks_Avg_Max.pdf tasks_Max_Avg.pdf tasks_Max_Min.pdf +tasks_Min_Max.pdf tasks_Var_Avg.pdf tasks_Var_Min.pdf + +However, when a single directory of directories is given, the script +assumes the experiments are related and can make line styles match in +different plots and more effectively parallelize the plotting. + -- cgit v1.2.2