bpf: HBM test script

Script for testing HBM (Host Bandwidth Manager) framework. It creates a cgroup to use for testing and load a BPF program to limit egress bandwidht. It then uses iperf3 or netperf to create loads. The output is the goodput in Mbps (unless -D is used). It can work on a single host using loopback or among two hosts (with netperf). When using loopback, it is recommended to also introduce a delay of at least 1ms (-d=1), otherwise the assigned bandwidth is likely to be underutilized. USAGE: $name [out] [-b=<prog>|--bpf=<prog>] [-c=<cc>|--cc=<cc>] [-D] [-d=<delay>|--delay=<delay>] [--debug] [-E] [-f=<#flows>|--flows=<#flows>] [-h] [-i=<id>|--id=<id >] [-l] [-N] [-p=<port>|--port=<port>] [-P] [-q=<qdisc>] [-R] [-s=<server>|--server=<server] [--stats] [-t=<time>|--time=<time>] [-w] [cubic|dctcp] Where: out Egress (default egress) -b or --bpf BPF program filename to load and attach. Default is nrm_out_kern.o for egress, -c or -cc TCP congestion control (cubic or dctcp) -d or --delay Add a delay in ms using netem -D In addition to the goodput in Mbps, it also outputs other detailed information. This information is test dependent (i.e. iperf3 or netperf). --debug Print BPF trace buffer -E Enable ECN (not required for dctcp) -f or --flows Number of concurrent flows (default=1) -i or --id cgroup id (an integer, default is 1) -l Do not limit flows using loopback -N Use netperf instead of iperf3 -h Help -p or --port iperf3 port (default is 5201) -P Use an iperf3 instance for each flow -q Use the specified qdisc. -r or --rate Rate in Mbps (default 1s 1Gbps) -R Use TCP_RR for netperf. 1st flow has req size of 10KB, rest of 1MB. Reply in all cases is 1 byte. More detailed output for each flow can be found in the files netperf.<cg>.<flow>, where <cg> is the cgroup id as specified with the -i flag, and <flow> is the flow id starting at 1 and increasing by 1 for flow (as specified by -f). -s or --server hostname of netperf server. Used to create netperf test traffic between to hosts (default is within host) netserver must be running on the host. --stats Get HBM stats (marked, dropped, etc.) -t or --time duration of iperf3 in seconds (default=5) -w Work conserving flag. cgroup can increase its bandwidth beyond the rate limit specified while there is available bandwidth. Current implementation assumes there is only one NIC (eth0), but can be extended to support multiple NICs. This is just a proof of concept. cubic or dctcp specify TCP CC to use Examples: ./do_hbm_test.sh -l -d=1 -D --stats Runs a 5 second test, using a single iperf3 flow and with the default rate limit of 1Gbps and a delay of 1ms (using netem) using the default TCP congestion control on the loopback device (hence we use "-l" to enforce bandwidth limit on loopback device). Since no direction is specified, it defaults to egress. Since no TCP CC algorithm is specified it uses the system default (Cubic for this test). With no -D flag, only the value of the AGGREGATE OUTPUT would show. id refers to the cgroup id and is useful when running multi cgroup tests (supported by a future patch). This patchset does not support calling TCP's congesion window reduction, even when packets are dropped by the BPF program, resulting in a large number of packets dropped. It is recommended that the current HBM implemenation only be used with ECN enabled flows. A future patch will add support for reducing TCP's cwnd and will increase the performance of non-ECN enabled flows. Output: Details for HBM in cgroup 1 id:1 rate_mbps:493 duration:4.8 secs packets:11355 bytes_MB:590 pkts_dropped:4497 bytes_dropped_MB:292 pkts_marked_percent: 39.60 bytes_marked_percent: 49.49 pkts_dropped_percent: 39.60 bytes_dropped_percent: 49.49 PING AVG DELAY:2.075 AGGREGATE_GOODPUT:505 ./do_nrm_test.sh -l -d=1 -D --stats dctcp Same as above but using dctcp. Note that fewer bytes are dropped (0.01% vs. 49%). Output: Details for HBM in cgroup 1 id:1 rate_mbps:945 duration:4.9 secs packets:16859 bytes_MB:578 pkts_dropped:1 bytes_dropped_MB:0 pkts_marked_percent: 28.74 bytes_marked_percent: 45.15 pkts_dropped_percent: 0.01 bytes_dropped_percent: 0.01 PING AVG DELAY:2.083 AGGREGATE_GOODPUT:965 ./do_nrm_test.sh -d=1 -D --stats As first example, but without limiting loopback device (i.e. no "-l" flag). Since there is no bandwidth limiting, no details for HBM are printed out. Output: Details for HBM in cgroup 1 PING AVG DELAY:2.019 AGGREGATE_GOODPUT:42655 ./do_hbm.sh -l -d=1 -D --stats -f=2 Uses iper3 and does 2 flows ./do_hbm.sh -l -d=1 -D --stats -f=4 -P Uses iperf3 and does 4 flows, each flow as a separate process. ./do_hbm.sh -l -d=1 -D --stats -f=4 -N Uses netperf, 4 flows ./do_hbm.sh -f=1 -r=2000 -t=5 -N -D --stats dctcp -s=<server-name> Uses netperf between two hosts. The remote host name is specified with -s= and you need to start the program netserver manually on the remote host. It will use 1 flow, a rate limit of 2Gbps and dctcp. ./do_hbm.sh -f=1 -r=2000 -t=5 -N -D --stats -w dctcp \ -s=<server-name> As previous, but allows use of extra bandwidth. For this test the rate is 8Gbps vs. 1Gbps of the previous test. Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
author: brakmo <brakmo@fb.com> 2019-03-01 15:38:50 -0500
committer: Alexei Starovoitov <ast@kernel.org> 2019-03-02 13:48:27 -0500
commit: 4ffd44cfd147d157406a26c995cd0c373bffd7a0 (patch)
tree: ef52094b0822f1ae7b3b152f92e5ecb535064180 /samples
parent: a1270fe95b74eb3195b107c494ed1f11b932a278 (diff)
1 files changed, 436 insertions, 0 deletions
diff --git a/samples/bpf/do_hbm_test.sh b/samples/bpf/do_hbm_test.sh
new file mode 100755
index 000000000000..56c8b4115c95
--- /dev/null
+++ b/samples/bpf/do_hbm_test.sh
@@ -0,0 +1,436 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright (c) 2019 Facebook
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of version 2 of the GNU General Public
+# License as published by the Free Software Foundation.
+Usage() {
+  echo "Script for testing HBM (Host Bandwidth Manager) framework."
+  echo "It creates a cgroup to use for testing and load a BPF program to limit"
+  echo "egress or ingress bandwidht. It then uses iperf3 or netperf to create"
+  echo "loads. The output is the goodput in Mbps (unless -D was used)."
+  echo ""
+  echo "USAGE: $name [out] [-b=<prog>|--bpf=<prog>] [-c=<cc>|--cc=<cc>] [-D]"
+  echo "             [-d=<delay>|--delay=<delay>] [--debug] [-E]"
+  echo "             [-f=<#flows>|--flows=<#flows>] [-h] [-i=<id>|--id=<id >]"
+  echo "             [-l] [-N] [-p=<port>|--port=<port>] [-P]"
+  echo "             [-q=<qdisc>] [-R] [-s=<server>|--server=<server]"
+  echo "             [-S|--stats] -t=<time>|--time=<time>] [-w] [cubic|dctcp]"
+  echo "  Where:"
+  echo "    out               egress (default)"
+  echo "    -b or --bpf       BPF program filename to load and attach."
+  echo "                      Default is hbm_out_kern.o for egress,"
+  echo "    -c or -cc         TCP congestion control (cubic or dctcp)"
+  echo "    --debug           print BPF trace buffer"
+  echo "    -d or --delay     add a delay in ms using netem"
+  echo "    -D                In addition to the goodput in Mbps, it also outputs"
+  echo "                      other detailed information. This information is"
+  echo "                      test dependent (i.e. iperf3 or netperf)."
+  echo "    -E                enable ECN (not required for dctcp)"
+  echo "    -f or --flows     number of concurrent flows (default=1)"
+  echo "    -i or --id        cgroup id (an integer, default is 1)"
+  echo "    -N                use netperf instead of iperf3"
+  echo "    -l                do not limit flows using loopback"
+  echo "    -h                Help"
+  echo "    -p or --port      iperf3 port (default is 5201)"
+  echo "    -P                use an iperf3 instance for each flow"
+  echo "    -q                use the specified qdisc"
+  echo "    -r or --rate      rate in Mbps (default 1s 1Gbps)"
+  echo "    -R                Use TCP_RR for netperf. 1st flow has req"
+  echo "                      size of 10KB, rest of 1MB. Reply in all"
+  echo "                      cases is 1 byte."
+  echo "                      More detailed output for each flow can be found"
+  echo "                      in the files netperf.<cg>.<flow>, where <cg> is the"
+  echo "                      cgroup id as specified with the -i flag, and <flow>"
+  echo "                      is the flow id starting at 1 and increasing by 1 for"
+  echo "                      flow (as specified by -f)."
+  echo "    -s or --server    hostname of netperf server. Used to create netperf"
+  echo "                      test traffic between to hosts (default is within host)"
+  echo "                      netserver must be running on the host."
+  echo "    -S or --stats     whether to update hbm stats (default is yes)."
+  echo "    -t or --time      duration of iperf3 in seconds (default=5)"
+  echo "    -w                Work conserving flag. cgroup can increase its"
+  echo "                      bandwidth beyond the rate limit specified"
+  echo "                      while there is available bandwidth. Current"
+  echo "                      implementation assumes there is only one NIC"
+  echo "                      (eth0), but can be extended to support multiple"
+  echo "                       NICs."
+  echo "    cubic or dctcp    specify which TCP CC to use"
+  echo " "
+  exit
+}
+#set -x
+debug_flag=0
+args="$@"
+name="$0"
+netem=0
+cc=x
+dir="-o"
+dir_name="out"
+dur=5
+flows=1
+id=1
+prog=""
+port=5201
+rate=1000
+multi_iperf=0
+flow_cnt=1
+use_netperf=0
+rr=0
+ecn=0
+details=0
+server=""
+qdisc=""
+flags=""
+do_stats=0
+function start_hbm () {
+  rm -f hbm.out
+  echo "./hbm $dir -n $id -r $rate -t $dur $flags $dbg $prog" > hbm.out
+  echo " " >> hbm.out
+  ./hbm $dir -n $id -r $rate -t $dur $flags $dbg $prog >> hbm.out 2>&1  &
+  echo $!
+}
+processArgs () {
+  for i in $args ; do
+    case $i in
+    # Support for upcomming ingress rate limiting
+    #in)         # support for upcoming ingress rate limiting
+    #  dir="-i"
+    #  dir_name="in"
+    #  ;;
+    out)
+      dir="-o"
+      dir_name="out"
+      ;;
+    -b=*|--bpf=*)
+      prog="${i#*=}"
+      ;;
+    -c=*|--cc=*)
+      cc="${i#*=}"
+      ;;
+    --debug)
+      flags="$flags -d"
+      debug_flag=1
+      ;;
+    -d=*|--delay=*)
+      netem="${i#*=}"
+      ;;
+    -D)
+      details=1
+      ;;
+    -E)
+     ecn=1
+     ;;
+    # Support for upcomming fq Early Departure Time egress rate limiting
+    #--edt)
+    # prog="hbm_out_edt_kern.o"
+    # qdisc="fq"
+    # ;;
+    -f=*|--flows=*)
+      flows="${i#*=}"
+      ;;
+    -i=*|--id=*)
+      id="${i#*=}"
+      ;;
+    -l)
+      flags="$flags -l"
+      ;;
+    -N)
+      use_netperf=1
+      ;;
+    -p=*|--port=*)
+      port="${i#*=}"
+      ;;
+    -P)
+      multi_iperf=1
+      ;;
+    -q=*)
+      qdisc="${i#*=}"
+      ;;
+    -r=*|--rate=*)
+      rate="${i#*=}"
+      ;;
+    -R)
+      rr=1
+      ;;
+    -s=*|--server=*)
+      server="${i#*=}"
+      ;;
+    -S|--stats)
+      flags="$flags -s"
+      do_stats=1
+      ;;
+    -t=*|--time=*)
+      dur="${i#*=}"
+      ;;
+    -w)
+      flags="$flags -w"
+      ;;
+    cubic)
+      cc=cubic
+      ;;
+    dctcp)
+      cc=dctcp
+      ;;
+    *)
+      echo "Unknown arg:$i"
+      Usage
+      ;;
+    esac
+  done
+}
+processArgs
+if [ $debug_flag -eq 1 ] ; then
+  rm -f hbm_out.log
+fi
+hbm_pid=$(start_hbm)
+usleep 100000
+host=`hostname`
+cg_base_dir=/sys/fs/cgroup
+cg_dir="$cg_base_dir/cgroup-test-work-dir/hbm$id"
+echo $$ >> $cg_dir/cgroup.procs
+ulimit -l unlimited
+rm -f ss.out
+rm -f hbm.[0-9]*.$dir_name
+if [ $ecn -ne 0 ] ; then
+  sysctl -w -q -n net.ipv4.tcp_ecn=1
+fi
+if [ $use_netperf -eq 0 ] ; then
+  cur_cc=`sysctl -n net.ipv4.tcp_congestion_control`
+  if [ "$cc" != "x" ] ; then
+    sysctl -w -q -n net.ipv4.tcp_congestion_control=$cc
+  fi
+fi
+if [ "$netem" -ne "0" ] ; then
+  if [ "$qdisc" != "" ] ; then
+    echo "WARNING: Ignoring -q options because -d option used"
+  fi
+  tc qdisc del dev lo root > /dev/null 2>&1
+  tc qdisc add dev lo root netem delay $netem\ms > /dev/null 2>&1
+elif [ "$qdisc" != "" ] ; then
+  tc qdisc del dev lo root > /dev/null 2>&1
+  tc qdisc add dev lo root $qdisc > /dev/null 2>&1
+fi
+n=0
+m=$[$dur * 5]
+hn="::1"
+if [ $use_netperf -ne 0 ] ; then
+  if [ "$server" != "" ] ; then
+    hn=$server
+  fi
+fi
+( ping6 -i 0.2 -c $m $hn > ping.out 2>&1 ) &
+if [ $use_netperf -ne 0 ] ; then
+  begNetserverPid=`ps ax | grep netserver | grep --invert-match "grep" | \
+                   awk '{ print $1 }'`
+  if [ "$begNetserverPid" == "" ] ; then
+    if [ "$server" == "" ] ; then
+      ( ./netserver > /dev/null 2>&1) &
+      usleep 100000
+    fi
+  fi
+  flow_cnt=1
+  if [ "$server" == "" ] ; then
+    np_server=$host
+  else
+    np_server=$server
+  fi
+  if [ "$cc" == "x" ] ; then
+    np_cc=""
+  else
+    np_cc="-K $cc,$cc"
+  fi
+  replySize=1
+  while [ $flow_cnt -le $flows ] ; do
+    if [ $rr -ne 0 ] ; then
+      reqSize=1M
+      if [ $flow_cnt -eq 1 ] ; then
+        reqSize=10K
+      fi
+      if [ "$dir" == "-i" ] ; then
+        replySize=$reqSize
+        reqSize=1
+      fi
+      ( ./netperf -H $np_server -l $dur -f m -j -t TCP_RR  -- -r $reqSize,$replySize $np_cc -k P50_lATENCY,P90_LATENCY,LOCAL_TRANSPORT_RETRANS,REMOTE_TRANSPORT_RETRANS,LOCAL_SEND_THROUGHPUT,LOCAL_RECV_THROUGHPUT,REQUEST_SIZE,RESPONSE_SIZE > netperf.$id.$flow_cnt ) &
+    else
+      if [ "$dir" == "-i" ] ; then
+        ( ./netperf -H $np_server -l $dur -f m -j -t TCP_RR -- -r 1,10M $np_cc -k P50_LATENCY,P90_LATENCY,LOCAL_TRANSPORT_RETRANS,LOCAL_SEND_THROUGHPUT,REMOTE_TRANSPORT_RETRANS,REMOTE_SEND_THROUGHPUT,REQUEST_SIZE,RESPONSE_SIZE > netperf.$id.$flow_cnt ) &
+      else
+        ( ./netperf -H $np_server -l $dur -f m -j -t TCP_STREAM -- $np_cc -k P50_lATENCY,P90_LATENCY,LOCAL_TRANSPORT_RETRANS,LOCAL_SEND_THROUGHPUT,REQUEST_SIZE,RESPONSE_SIZE > netperf.$id.$flow_cnt ) &
+      fi
+    fi
+    flow_cnt=$[flow_cnt+1]
+  done
+# sleep for duration of test (plus some buffer)
+  n=$[dur+2]
+  sleep $n
+# force graceful termination of netperf
+  pids=`pgrep netperf`
+  for p in $pids ; do
+    kill -SIGALRM $p
+  done
+  flow_cnt=1
+  rate=0
+  if [ $details -ne 0 ] ; then
+    echo ""
+    echo "Details for HBM in cgroup $id"
+    if [ $do_stats -eq 1 ] ; then
+      if [ -e hbm.$id.$dir_name ] ; then
+        cat hbm.$id.$dir_name
+      fi
+    fi
+  fi
+  while [ $flow_cnt -le $flows ] ; do
+    if [ "$dir" == "-i" ] ; then
+      r=`cat netperf.$id.$flow_cnt | grep -o "REMOTE_SEND_THROUGHPUT=[0-9]*" | grep -o "[0-9]*"`
+    else
+      r=`cat netperf.$id.$flow_cnt | grep -o "LOCAL_SEND_THROUGHPUT=[0-9]*" | grep -o "[0-9]*"`
+    fi
+    echo "rate for flow $flow_cnt: $r"
+    rate=$[rate+r]
+    if [ $details -ne 0 ] ; then
+      echo "-----"
+      echo "Details for cgroup $id, flow $flow_cnt"
+      cat netperf.$id.$flow_cnt
+    fi
+    flow_cnt=$[flow_cnt+1]
+  done
+  if [ $details -ne 0 ] ; then
+    echo ""
+    delay=`grep "avg" ping.out | grep -o "= [0-9.]*/[0-9.]*" | grep -o "[0-9.]*$"`
+    echo "PING AVG DELAY:$delay"
+    echo "AGGREGATE_GOODPUT:$rate"
+  else
+    echo $rate
+  fi
+elif [ $multi_iperf -eq 0 ] ; then
+  (iperf3 -s -p $port -1 > /dev/null 2>&1) &
+  usleep 100000
+  iperf3 -c $host -p $port -i 0 -P $flows -f m -t $dur > iperf.$id
+  rates=`grep receiver iperf.$id | grep -o "[0-9.]* Mbits" | grep -o "^[0-9]*"`
+  rate=`echo $rates | grep -o "[0-9]*$"`
+  if [ $details -ne 0 ] ; then
+    echo ""
+    echo "Details for HBM in cgroup $id"
+    if [ $do_stats -eq 1 ] ; then
+      if [ -e hbm.$id.$dir_name ] ; then
+        cat hbm.$id.$dir_name
+      fi
+    fi
+    delay=`grep "avg" ping.out | grep -o "= [0-9.]*/[0-9.]*" | grep -o "[0-9.]*$"`
+    echo "PING AVG DELAY:$delay"
+    echo "AGGREGATE_GOODPUT:$rate"
+  else
+    echo $rate
+  fi
+else
+  flow_cnt=1
+  while [ $flow_cnt -le $flows ] ; do
+    (iperf3 -s -p $port -1 > /dev/null 2>&1) &
+    ( iperf3 -c $host -p $port -i 0 -P 1 -f m -t $dur | grep receiver | grep -o "[0-9.]* Mbits" | grep -o "^[0-9]*" | grep -o "[0-9]*$" > iperf3.$id.$flow_cnt ) &
+    port=$[port+1]
+    flow_cnt=$[flow_cnt+1]
+  done
+  n=$[dur+1]
+  sleep $n
+  flow_cnt=1
+  rate=0
+  if [ $details -ne 0 ] ; then
+    echo ""
+    echo "Details for HBM in cgroup $id"
+    if [ $do_stats -eq 1 ] ; then
+      if [ -e hbm.$id.$dir_name ] ; then
+        cat hbm.$id.$dir_name
+      fi
+    fi
+  fi
+  while [ $flow_cnt -le $flows ] ; do
+    r=`cat iperf3.$id.$flow_cnt`
+#    echo "rate for flow $flow_cnt: $r"
+  if [ $details -ne 0 ] ; then
+    echo "Rate for cgroup $id, flow $flow_cnt LOCAL_SEND_THROUGHPUT=$r"
+  fi
+    rate=$[rate+r]
+    flow_cnt=$[flow_cnt+1]
+  done
+  if [ $details -ne 0 ] ; then
+    delay=`grep "avg" ping.out | grep -o "= [0-9.]*/[0-9.]*" | grep -o "[0-9.]*$"`
+    echo "PING AVG DELAY:$delay"
+    echo "AGGREGATE_GOODPUT:$rate"
+  else
+    echo $rate
+  fi
+fi
+if [ $use_netperf -eq 0 ] ; then
+  sysctl -w -q -n net.ipv4.tcp_congestion_control=$cur_cc
+fi
+if [ $ecn -ne 0 ] ; then
+  sysctl -w -q -n net.ipv4.tcp_ecn=0
+fi
+if [ "$netem" -ne "0" ] ; then
+  tc qdisc del dev lo root > /dev/null 2>&1
+fi
+sleep 2
+hbmPid=`ps ax | grep "hbm " | grep --invert-match "grep" | awk '{ print $1 }'`
+if [ "$hbmPid" == "$hbm_pid" ] ; then
+  kill $hbm_pid
+fi
+sleep 1
+# Detach any BPF programs that may have lingered
+ttx=`bpftool cgroup tree | grep hbm`
+v=2
+for x in $ttx ; do
+    if [ "${x:0:36}" == "/sys/fs/cgroup/cgroup-test-work-dir/" ] ; then
+        cg=$x ; v=0
+    else
+        if [ $v -eq 0 ] ; then
+            id=$x ; v=1
+        else
+            if [ $v -eq 1 ] ; then
+                type=$x ; bpftool cgroup detach $cg $type id $id
+                v=0
+            fi
+        fi
+    fi
+done
+if [ $use_netperf -ne 0 ] ; then
+  if [ "$server" == "" ] ; then
+    if [ "$begNetserverPid" == "" ] ; then
+      netserverPid=`ps ax | grep netserver | grep --invert-match "grep" | awk '{ print $1 }'`
+      if [ "$netserverPid" != "" ] ; then
+        kill $netserverPid
+      fi
+    fi
+  fi
+fi
+exit
author	brakmo <brakmo@fb.com>	2019-03-01 15:38:50 -0500
committer	Alexei Starovoitov <ast@kernel.org>	2019-03-02 13:48:27 -0500
commit	4ffd44cfd147d157406a26c995cd0c373bffd7a0 (patch)
tree	ef52094b0822f1ae7b3b152f92e5ecb535064180 /samples
parent	a1270fe95b74eb3195b107c494ed1f11b932a278 (diff)

diff --git a/samples/bpf/do_hbm_test.sh b/samples/bpf/do_hbm_test.sh new file mode 100755 index 000000000000..56c8b4115c95 --- /dev/null +++ b/samples/bpf/do_hbm_test.sh
@@ -0,0 +1,436 @@
	1	#!/bin/bash
	2	# SPDX-License-Identifier: GPL-2.0
	3	#
	4	# Copyright (c) 2019 Facebook
	5	#
	6	# This program is free software; you can redistribute it and/or
	7	# modify it under the terms of version 2 of the GNU General Public
	8	# License as published by the Free Software Foundation.
	9
	10	Usage() {
	11	echo "Script for testing HBM (Host Bandwidth Manager) framework."
	12	echo "It creates a cgroup to use for testing and load a BPF program to limit"
	13	echo "egress or ingress bandwidht. It then uses iperf3 or netperf to create"
	14	echo "loads. The output is the goodput in Mbps (unless -D was used)."
	15	echo ""
	16	echo "USAGE: $name [out] [-b=<prog>\|--bpf=<prog>] [-c=<cc>\|--cc=<cc>] [-D]"
	17	echo " [-d=<delay>\|--delay=<delay>] [--debug] [-E]"
	18	echo " [-f=<#flows>\|--flows=<#flows>] [-h] [-i=<id>\|--id=<id >]"
	19	echo " [-l] [-N] [-p=<port>\|--port=<port>] [-P]"
	20	echo " [-q=<qdisc>] [-R] [-s=<server>\|--server=<server]"
	21	echo " [-S\|--stats] -t=<time>\|--time=<time>] [-w] [cubic\|dctcp]"
	22	echo " Where:"
	23	echo " out egress (default)"
	24	echo " -b or --bpf BPF program filename to load and attach."
	25	echo " Default is hbm_out_kern.o for egress,"
	26	echo " -c or -cc TCP congestion control (cubic or dctcp)"
	27	echo " --debug print BPF trace buffer"
	28	echo " -d or --delay add a delay in ms using netem"
	29	echo " -D In addition to the goodput in Mbps, it also outputs"
	30	echo " other detailed information. This information is"
	31	echo " test dependent (i.e. iperf3 or netperf)."
	32	echo " -E enable ECN (not required for dctcp)"
	33	echo " -f or --flows number of concurrent flows (default=1)"
	34	echo " -i or --id cgroup id (an integer, default is 1)"
	35	echo " -N use netperf instead of iperf3"
	36	echo " -l do not limit flows using loopback"
	37	echo " -h Help"
	38	echo " -p or --port iperf3 port (default is 5201)"
	39	echo " -P use an iperf3 instance for each flow"
	40	echo " -q use the specified qdisc"
	41	echo " -r or --rate rate in Mbps (default 1s 1Gbps)"
	42	echo " -R Use TCP_RR for netperf. 1st flow has req"
	43	echo " size of 10KB, rest of 1MB. Reply in all"
	44	echo " cases is 1 byte."
	45	echo " More detailed output for each flow can be found"
	46	echo " in the files netperf.<cg>.<flow>, where <cg> is the"
	47	echo " cgroup id as specified with the -i flag, and <flow>"
	48	echo " is the flow id starting at 1 and increasing by 1 for"
	49	echo " flow (as specified by -f)."
	50	echo " -s or --server hostname of netperf server. Used to create netperf"
	51	echo " test traffic between to hosts (default is within host)"
	52	echo " netserver must be running on the host."
	53	echo " -S or --stats whether to update hbm stats (default is yes)."
	54	echo " -t or --time duration of iperf3 in seconds (default=5)"
	55	echo " -w Work conserving flag. cgroup can increase its"
	56	echo " bandwidth beyond the rate limit specified"
	57	echo " while there is available bandwidth. Current"
	58	echo " implementation assumes there is only one NIC"
	59	echo " (eth0), but can be extended to support multiple"
	60	echo " NICs."
	61	echo " cubic or dctcp specify which TCP CC to use"
	62	echo " "
	63	exit
	64	}
	65
	66	#set -x
	67
	68	debug_flag=0
	69	args="$@"
	70	name="$0"
	71	netem=0
	72	cc=x
	73	dir="-o"
	74	dir_name="out"
	75	dur=5
	76	flows=1
	77	id=1
	78	prog=""
	79	port=5201
	80	rate=1000
	81	multi_iperf=0
	82	flow_cnt=1
	83	use_netperf=0
	84	rr=0
	85	ecn=0
	86	details=0
	87	server=""
	88	qdisc=""
	89	flags=""
	90	do_stats=0
	91
	92	function start_hbm () {
	93	rm -f hbm.out
	94	echo "./hbm $dir -n $id -r $rate -t $dur $flags $dbg $prog" > hbm.out
	95	echo " " >> hbm.out
	96	./hbm $dir -n $id -r $rate -t $dur $flags $dbg $prog >> hbm.out 2>&1 &
	97	echo $!
	98	}
	99
	100	processArgs () {
	101	for i in $args ; do
	102	case $i in
	103	# Support for upcomming ingress rate limiting
	104	#in) # support for upcoming ingress rate limiting
	105	# dir="-i"
	106	# dir_name="in"
	107	# ;;
	108	out)
	109	dir="-o"
	110	dir_name="out"
	111	;;
	112	-b=\|--bpf=)
	113	prog="${i#*=}"
	114	;;
	115	-c=\|--cc=)
	116	cc="${i#*=}"
	117	;;
	118	--debug)
	119	flags="$flags -d"
	120	debug_flag=1
	121	;;
	122	-d=\|--delay=)
	123	netem="${i#*=}"
	124	;;
	125	-D)
	126	details=1
	127	;;
	128	-E)
	129	ecn=1
	130	;;
	131	# Support for upcomming fq Early Departure Time egress rate limiting
	132	#--edt)
	133	# prog="hbm_out_edt_kern.o"
	134	# qdisc="fq"
	135	# ;;
	136	-f=\|--flows=)
	137	flows="${i#*=}"
	138	;;
	139	-i=\|--id=)
	140	id="${i#*=}"
	141	;;
	142	-l)
	143	flags="$flags -l"
	144	;;
	145	-N)
	146	use_netperf=1
	147	;;
	148	-p=\|--port=)
	149	port="${i#*=}"
	150	;;
	151	-P)
	152	multi_iperf=1
	153	;;
	154	-q=*)
	155	qdisc="${i#*=}"
	156	;;
	157	-r=\|--rate=)
	158	rate="${i#*=}"
	159	;;
	160	-R)
	161	rr=1
	162	;;
	163	-s=\|--server=)
	164	server="${i#*=}"
	165	;;
	166	-S\|--stats)
	167	flags="$flags -s"
	168	do_stats=1
	169	;;
	170	-t=\|--time=)
	171	dur="${i#*=}"
	172	;;
	173	-w)
	174	flags="$flags -w"
	175	;;
	176	cubic)
	177	cc=cubic
	178	;;
	179	dctcp)
	180	cc=dctcp
	181	;;
	182	*)
	183	echo "Unknown arg:$i"
	184	Usage
	185	;;
	186	esac
	187	done
	188	}
	189
	190	processArgs
	191
	192	if [ $debug_flag -eq 1 ] ; then
	193	rm -f hbm_out.log
	194	fi
	195
	196	hbm_pid=$(start_hbm)
	197	usleep 100000
	198
	199	host=`hostname`
	200	cg_base_dir=/sys/fs/cgroup
	201	cg_dir="$cg_base_dir/cgroup-test-work-dir/hbm$id"
	202
	203	echo $$ >> $cg_dir/cgroup.procs
	204
	205	ulimit -l unlimited
	206
	207	rm -f ss.out
	208	rm -f hbm.[0-9]*.$dir_name
	209	if [ $ecn -ne 0 ] ; then
	210	sysctl -w -q -n net.ipv4.tcp_ecn=1
	211	fi
	212
	213	if [ $use_netperf -eq 0 ] ; then
	214	cur_cc=`sysctl -n net.ipv4.tcp_congestion_control`
	215	if [ "$cc" != "x" ] ; then
	216	sysctl -w -q -n net.ipv4.tcp_congestion_control=$cc
	217	fi
	218	fi
	219
	220	if [ "$netem" -ne "0" ] ; then
	221	if [ "$qdisc" != "" ] ; then
	222	echo "WARNING: Ignoring -q options because -d option used"
	223	fi
	224	tc qdisc del dev lo root > /dev/null 2>&1
	225	tc qdisc add dev lo root netem delay $netem\ms > /dev/null 2>&1
	226	elif [ "$qdisc" != "" ] ; then
	227	tc qdisc del dev lo root > /dev/null 2>&1
	228	tc qdisc add dev lo root $qdisc > /dev/null 2>&1
	229	fi
	230
	231	n=0
	232	m=$[$dur * 5]
	233	hn="::1"
	234	if [ $use_netperf -ne 0 ] ; then
	235	if [ "$server" != "" ] ; then
	236	hn=$server
	237	fi
	238	fi
	239
	240	( ping6 -i 0.2 -c $m $hn > ping.out 2>&1 ) &
	241
	242	if [ $use_netperf -ne 0 ] ; then
	243	begNetserverPid=`ps ax \| grep netserver \| grep --invert-match "grep" \| \
	244	awk '{ print $1 }'`
	245	if [ "$begNetserverPid" == "" ] ; then
	246	if [ "$server" == "" ] ; then
	247	( ./netserver > /dev/null 2>&1) &
	248	usleep 100000
	249	fi
	250	fi
	251	flow_cnt=1
	252	if [ "$server" == "" ] ; then
	253	np_server=$host
	254	else
	255	np_server=$server
	256	fi
	257	if [ "$cc" == "x" ] ; then
	258	np_cc=""
	259	else
	260	np_cc="-K $cc,$cc"
	261	fi
	262	replySize=1
	263	while [ $flow_cnt -le $flows ] ; do
	264	if [ $rr -ne 0 ] ; then
	265	reqSize=1M
	266	if [ $flow_cnt -eq 1 ] ; then
	267	reqSize=10K
	268	fi
	269	if [ "$dir" == "-i" ] ; then
	270	replySize=$reqSize
	271	reqSize=1
	272	fi
	273	( ./netperf -H $np_server -l $dur -f m -j -t TCP_RR -- -r $reqSize,$replySize $np_cc -k P50_lATENCY,P90_LATENCY,LOCAL_TRANSPORT_RETRANS,REMOTE_TRANSPORT_RETRANS,LOCAL_SEND_THROUGHPUT,LOCAL_RECV_THROUGHPUT,REQUEST_SIZE,RESPONSE_SIZE > netperf.$id.$flow_cnt ) &
	274	else
	275	if [ "$dir" == "-i" ] ; then
	276	( ./netperf -H $np_server -l $dur -f m -j -t TCP_RR -- -r 1,10M $np_cc -k P50_LATENCY,P90_LATENCY,LOCAL_TRANSPORT_RETRANS,LOCAL_SEND_THROUGHPUT,REMOTE_TRANSPORT_RETRANS,REMOTE_SEND_THROUGHPUT,REQUEST_SIZE,RESPONSE_SIZE > netperf.$id.$flow_cnt ) &
	277	else
	278	( ./netperf -H $np_server -l $dur -f m -j -t TCP_STREAM -- $np_cc -k P50_lATENCY,P90_LATENCY,LOCAL_TRANSPORT_RETRANS,LOCAL_SEND_THROUGHPUT,REQUEST_SIZE,RESPONSE_SIZE > netperf.$id.$flow_cnt ) &
	279	fi
	280	fi
	281	flow_cnt=$[flow_cnt+1]
	282	done
	283
	284	# sleep for duration of test (plus some buffer)
	285	n=$[dur+2]
	286	sleep $n
	287
	288	# force graceful termination of netperf
	289	pids=`pgrep netperf`
	290	for p in $pids ; do
	291	kill -SIGALRM $p
	292	done
	293
	294	flow_cnt=1
	295	rate=0
	296	if [ $details -ne 0 ] ; then
	297	echo ""
	298	echo "Details for HBM in cgroup $id"
	299	if [ $do_stats -eq 1 ] ; then
	300	if [ -e hbm.$id.$dir_name ] ; then
	301	cat hbm.$id.$dir_name
	302	fi
	303	fi
	304	fi
	305	while [ $flow_cnt -le $flows ] ; do
	306	if [ "$dir" == "-i" ] ; then
	307	r=`cat netperf.$id.$flow_cnt \| grep -o "REMOTE_SEND_THROUGHPUT=[0-9]" \| grep -o "[0-9]"`
	308	else
	309	r=`cat netperf.$id.$flow_cnt \| grep -o "LOCAL_SEND_THROUGHPUT=[0-9]" \| grep -o "[0-9]"`
	310	fi
	311	echo "rate for flow $flow_cnt: $r"
	312	rate=$[rate+r]
	313	if [ $details -ne 0 ] ; then
	314	echo "-----"
	315	echo "Details for cgroup $id, flow $flow_cnt"
	316	cat netperf.$id.$flow_cnt
	317	fi
	318	flow_cnt=$[flow_cnt+1]
	319	done
	320	if [ $details -ne 0 ] ; then
	321	echo ""
	322	delay=`grep "avg" ping.out \| grep -o "= [0-9.]/[0-9.]" \| grep -o "[0-9.]*$"`
	323	echo "PING AVG DELAY:$delay"
	324	echo "AGGREGATE_GOODPUT:$rate"
	325	else
	326	echo $rate
	327	fi
	328	elif [ $multi_iperf -eq 0 ] ; then
	329	(iperf3 -s -p $port -1 > /dev/null 2>&1) &
	330	usleep 100000
	331	iperf3 -c $host -p $port -i 0 -P $flows -f m -t $dur > iperf.$id
	332	rates=`grep receiver iperf.$id \| grep -o "[0-9.]* Mbits" \| grep -o "^[0-9]*"`
	333	rate=`echo $rates \| grep -o "[0-9]*$"`
	334
	335	if [ $details -ne 0 ] ; then
	336	echo ""
	337	echo "Details for HBM in cgroup $id"
	338	if [ $do_stats -eq 1 ] ; then
	339	if [ -e hbm.$id.$dir_name ] ; then
	340	cat hbm.$id.$dir_name
	341	fi
	342	fi
	343	delay=`grep "avg" ping.out \| grep -o "= [0-9.]/[0-9.]" \| grep -o "[0-9.]*$"`
	344	echo "PING AVG DELAY:$delay"
	345	echo "AGGREGATE_GOODPUT:$rate"
	346	else
	347	echo $rate
	348	fi
	349	else
	350	flow_cnt=1
	351	while [ $flow_cnt -le $flows ] ; do
	352	(iperf3 -s -p $port -1 > /dev/null 2>&1) &
	353	( iperf3 -c $host -p $port -i 0 -P 1 -f m -t $dur \| grep receiver \| grep -o "[0-9.]* Mbits" \| grep -o "^[0-9]" \| grep -o "[0-9]$" > iperf3.$id.$flow_cnt ) &
	354	port=$[port+1]
	355	flow_cnt=$[flow_cnt+1]
	356	done
	357	n=$[dur+1]
	358	sleep $n
	359	flow_cnt=1
	360	rate=0
	361	if [ $details -ne 0 ] ; then
	362	echo ""
	363	echo "Details for HBM in cgroup $id"
	364	if [ $do_stats -eq 1 ] ; then
	365	if [ -e hbm.$id.$dir_name ] ; then
	366	cat hbm.$id.$dir_name
	367	fi
	368	fi
	369	fi
	370
	371	while [ $flow_cnt -le $flows ] ; do
	372	r=`cat iperf3.$id.$flow_cnt`
	373	# echo "rate for flow $flow_cnt: $r"
	374	if [ $details -ne 0 ] ; then
	375	echo "Rate for cgroup $id, flow $flow_cnt LOCAL_SEND_THROUGHPUT=$r"
	376	fi
	377	rate=$[rate+r]
	378	flow_cnt=$[flow_cnt+1]
	379	done
	380	if [ $details -ne 0 ] ; then
	381	delay=`grep "avg" ping.out \| grep -o "= [0-9.]/[0-9.]" \| grep -o "[0-9.]*$"`
	382	echo "PING AVG DELAY:$delay"
	383	echo "AGGREGATE_GOODPUT:$rate"
	384	else
	385	echo $rate
	386	fi
	387	fi
	388
	389	if [ $use_netperf -eq 0 ] ; then
	390	sysctl -w -q -n net.ipv4.tcp_congestion_control=$cur_cc
	391	fi
	392	if [ $ecn -ne 0 ] ; then
	393	sysctl -w -q -n net.ipv4.tcp_ecn=0
	394	fi
	395	if [ "$netem" -ne "0" ] ; then
	396	tc qdisc del dev lo root > /dev/null 2>&1
	397	fi
	398
	399	sleep 2
	400
	401	hbmPid=`ps ax \| grep "hbm " \| grep --invert-match "grep" \| awk '{ print $1 }'`
	402	if [ "$hbmPid" == "$hbm_pid" ] ; then
	403	kill $hbm_pid
	404	fi
	405
	406	sleep 1
	407
	408	# Detach any BPF programs that may have lingered
	409	ttx=`bpftool cgroup tree \| grep hbm`
	410	v=2
	411	for x in $ttx ; do
	412	if [ "${x:0:36}" == "/sys/fs/cgroup/cgroup-test-work-dir/" ] ; then
	413	cg=$x ; v=0
	414	else
	415	if [ $v -eq 0 ] ; then
	416	id=$x ; v=1
	417	else
	418	if [ $v -eq 1 ] ; then
	419	type=$x ; bpftool cgroup detach $cg $type id $id
	420	v=0
	421	fi
	422	fi
	423	fi
	424	done
	425
	426	if [ $use_netperf -ne 0 ] ; then
	427	if [ "$server" == "" ] ; then
	428	if [ "$begNetserverPid" == "" ] ; then
	429	netserverPid=`ps ax \| grep netserver \| grep --invert-match "grep" \| awk '{ print $1 }'`
	430	if [ "$netserverPid" != "" ] ; then
	431	kill $netserverPid
	432	fi
	433	fi
	434	fi
	435	fi
	436	exit