aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
-rw-r--r--Documentation/trace/events.txt1555
-rw-r--r--Documentation/trace/ftrace.txt34
-rw-r--r--include/linux/trace_events.h120
-rw-r--r--kernel/trace/Kconfig26
-rw-r--r--kernel/trace/Makefile2
-rw-r--r--kernel/trace/trace.c275
-rw-r--r--kernel/trace/trace.h190
-rw-r--r--kernel/trace/trace_events.c329
-rw-r--r--kernel/trace/trace_events_filter.c77
-rw-r--r--kernel/trace/trace_events_hist.c1755
-rw-r--r--kernel/trace/trace_events_trigger.c215
-rw-r--r--kernel/trace/tracing_map.c1062
-rw-r--r--kernel/trace/tracing_map.h283
-rw-r--r--tools/testing/selftests/ftrace/test.d/functions9
-rw-r--r--tools/testing/selftests/ftrace/test.d/trigger/trigger-eventonoff.tc64
-rw-r--r--tools/testing/selftests/ftrace/test.d/trigger/trigger-filter.tc59
-rw-r--r--tools/testing/selftests/ftrace/test.d/trigger/trigger-hist-mod.tc75
-rw-r--r--tools/testing/selftests/ftrace/test.d/trigger/trigger-hist.tc83
-rw-r--r--tools/testing/selftests/ftrace/test.d/trigger/trigger-multihist.tc73
-rw-r--r--tools/testing/selftests/ftrace/test.d/trigger/trigger-snapshot.tc56
-rw-r--r--tools/testing/selftests/ftrace/test.d/trigger/trigger-stacktrace.tc53
-rw-r--r--tools/testing/selftests/ftrace/test.d/trigger/trigger-traceonoff.tc58
22 files changed, 6041 insertions, 412 deletions
diff --git a/Documentation/trace/events.txt b/Documentation/trace/events.txt
index c010be8c85d7..08d74d75150d 100644
--- a/Documentation/trace/events.txt
+++ b/Documentation/trace/events.txt
@@ -512,3 +512,1558 @@ The following commands are supported:
512 512
513 Note that there can be only one traceon or traceoff trigger per 513 Note that there can be only one traceon or traceoff trigger per
514 triggering event. 514 triggering event.
515
516- hist
517
518 This command aggregates event hits into a hash table keyed on one or
519 more trace event format fields (or stacktrace) and a set of running
520 totals derived from one or more trace event format fields and/or
521 event counts (hitcount).
522
523 The format of a hist trigger is as follows:
524
525 hist:keys=<field1[,field2,...]>[:values=<field1[,field2,...]>]
526 [:sort=<field1[,field2,...]>][:size=#entries][:pause][:continue]
527 [:clear][:name=histname1] [if <filter>]
528
529 When a matching event is hit, an entry is added to a hash table
530 using the key(s) and value(s) named. Keys and values correspond to
531 fields in the event's format description. Values must correspond to
532 numeric fields - on an event hit, the value(s) will be added to a
533 sum kept for that field. The special string 'hitcount' can be used
534 in place of an explicit value field - this is simply a count of
535 event hits. If 'values' isn't specified, an implicit 'hitcount'
536 value will be automatically created and used as the only value.
537 Keys can be any field, or the special string 'stacktrace', which
538 will use the event's kernel stacktrace as the key. The keywords
539 'keys' or 'key' can be used to specify keys, and the keywords
540 'values', 'vals', or 'val' can be used to specify values. Compound
541 keys consisting of up to two fields can be specified by the 'keys'
542 keyword. Hashing a compound key produces a unique entry in the
543 table for each unique combination of component keys, and can be
544 useful for providing more fine-grained summaries of event data.
545 Additionally, sort keys consisting of up to two fields can be
546 specified by the 'sort' keyword. If more than one field is
547 specified, the result will be a 'sort within a sort': the first key
548 is taken to be the primary sort key and the second the secondary
549 key. If a hist trigger is given a name using the 'name' parameter,
550 its histogram data will be shared with other triggers of the same
551 name, and trigger hits will update this common data. Only triggers
552 with 'compatible' fields can be combined in this way; triggers are
553 'compatible' if the fields named in the trigger share the same
554 number and type of fields and those fields also have the same names.
555 Note that any two events always share the compatible 'hitcount' and
556 'stacktrace' fields and can therefore be combined using those
557 fields, however pointless that may be.
558
559 'hist' triggers add a 'hist' file to each event's subdirectory.
560 Reading the 'hist' file for the event will dump the hash table in
561 its entirety to stdout. If there are multiple hist triggers
562 attached to an event, there will be a table for each trigger in the
563 output. The table displayed for a named trigger will be the same as
564 any other instance having the same name. Each printed hash table
565 entry is a simple list of the keys and values comprising the entry;
566 keys are printed first and are delineated by curly braces, and are
567 followed by the set of value fields for the entry. By default,
568 numeric fields are displayed as base-10 integers. This can be
569 modified by appending any of the following modifiers to the field
570 name:
571
572 .hex display a number as a hex value
573 .sym display an address as a symbol
574 .sym-offset display an address as a symbol and offset
575 .syscall display a syscall id as a system call name
576 .execname display a common_pid as a program name
577
578 Note that in general the semantics of a given field aren't
579 interpreted when applying a modifier to it, but there are some
580 restrictions to be aware of in this regard:
581
582 - only the 'hex' modifier can be used for values (because values
583 are essentially sums, and the other modifiers don't make sense
584 in that context).
585 - the 'execname' modifier can only be used on a 'common_pid'. The
586 reason for this is that the execname is simply the 'comm' value
587 saved for the 'current' process when an event was triggered,
588 which is the same as the common_pid value saved by the event
589 tracing code. Trying to apply that comm value to other pid
590 values wouldn't be correct, and typically events that care save
591 pid-specific comm fields in the event itself.
592
593 A typical usage scenario would be the following to enable a hist
594 trigger, read its current contents, and then turn it off:
595
596 # echo 'hist:keys=skbaddr.hex:vals=len' > \
597 /sys/kernel/debug/tracing/events/net/netif_rx/trigger
598
599 # cat /sys/kernel/debug/tracing/events/net/netif_rx/hist
600
601 # echo '!hist:keys=skbaddr.hex:vals=len' > \
602 /sys/kernel/debug/tracing/events/net/netif_rx/trigger
603
604 The trigger file itself can be read to show the details of the
605 currently attached hist trigger. This information is also displayed
606 at the top of the 'hist' file when read.
607
608 By default, the size of the hash table is 2048 entries. The 'size'
609 parameter can be used to specify more or fewer than that. The units
610 are in terms of hashtable entries - if a run uses more entries than
611 specified, the results will show the number of 'drops', the number
612 of hits that were ignored. The size should be a power of 2 between
613 128 and 131072 (any non- power-of-2 number specified will be rounded
614 up).
615
616 The 'sort' parameter can be used to specify a value field to sort
617 on. The default if unspecified is 'hitcount' and the default sort
618 order is 'ascending'. To sort in the opposite direction, append
619 .descending' to the sort key.
620
621 The 'pause' parameter can be used to pause an existing hist trigger
622 or to start a hist trigger but not log any events until told to do
623 so. 'continue' or 'cont' can be used to start or restart a paused
624 hist trigger.
625
626 The 'clear' parameter will clear the contents of a running hist
627 trigger and leave its current paused/active state.
628
629 Note that the 'pause', 'cont', and 'clear' parameters should be
630 applied using 'append' shell operator ('>>') if applied to an
631 existing trigger, rather than via the '>' operator, which will cause
632 the trigger to be removed through truncation.
633
634- enable_hist/disable_hist
635
636 The enable_hist and disable_hist triggers can be used to have one
637 event conditionally start and stop another event's already-attached
638 hist trigger. Any number of enable_hist and disable_hist triggers
639 can be attached to a given event, allowing that event to kick off
640 and stop aggregations on a host of other events.
641
642 The format is very similar to the enable/disable_event triggers:
643
644 enable_hist:<system>:<event>[:count]
645 disable_hist:<system>:<event>[:count]
646
647 Instead of enabling or disabling the tracing of the target event
648 into the trace buffer as the enable/disable_event triggers do, the
649 enable/disable_hist triggers enable or disable the aggregation of
650 the target event into a hash table.
651
652 A typical usage scenario for the enable_hist/disable_hist triggers
653 would be to first set up a paused hist trigger on some event,
654 followed by an enable_hist/disable_hist pair that turns the hist
655 aggregation on and off when conditions of interest are hit:
656
657 # echo 'hist:keys=skbaddr.hex:vals=len:pause' > \
658 /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger
659
660 # echo 'enable_hist:net:netif_receive_skb if filename==/usr/bin/wget' > \
661 /sys/kernel/debug/tracing/events/sched/sched_process_exec/trigger
662
663 # echo 'disable_hist:net:netif_receive_skb if comm==wget' > \
664 /sys/kernel/debug/tracing/events/sched/sched_process_exit/trigger
665
666 The above sets up an initially paused hist trigger which is unpaused
667 and starts aggregating events when a given program is executed, and
668 which stops aggregating when the process exits and the hist trigger
669 is paused again.
670
671 The examples below provide a more concrete illustration of the
672 concepts and typical usage patterns discussed above.
673
674
6756.2 'hist' trigger examples
676---------------------------
677
678 The first set of examples creates aggregations using the kmalloc
679 event. The fields that can be used for the hist trigger are listed
680 in the kmalloc event's format file:
681
682 # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/format
683 name: kmalloc
684 ID: 374
685 format:
686 field:unsigned short common_type; offset:0; size:2; signed:0;
687 field:unsigned char common_flags; offset:2; size:1; signed:0;
688 field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
689 field:int common_pid; offset:4; size:4; signed:1;
690
691 field:unsigned long call_site; offset:8; size:8; signed:0;
692 field:const void * ptr; offset:16; size:8; signed:0;
693 field:size_t bytes_req; offset:24; size:8; signed:0;
694 field:size_t bytes_alloc; offset:32; size:8; signed:0;
695 field:gfp_t gfp_flags; offset:40; size:4; signed:0;
696
697 We'll start by creating a hist trigger that generates a simple table
698 that lists the total number of bytes requested for each function in
699 the kernel that made one or more calls to kmalloc:
700
701 # echo 'hist:key=call_site:val=bytes_req' > \
702 /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
703
704 This tells the tracing system to create a 'hist' trigger using the
705 call_site field of the kmalloc event as the key for the table, which
706 just means that each unique call_site address will have an entry
707 created for it in the table. The 'val=bytes_req' parameter tells
708 the hist trigger that for each unique entry (call_site) in the
709 table, it should keep a running total of the number of bytes
710 requested by that call_site.
711
712 We'll let it run for awhile and then dump the contents of the 'hist'
713 file in the kmalloc event's subdirectory (for readability, a number
714 of entries have been omitted):
715
716 # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist
717 # trigger info: hist:keys=call_site:vals=bytes_req:sort=hitcount:size=2048 [active]
718
719 { call_site: 18446744072106379007 } hitcount: 1 bytes_req: 176
720 { call_site: 18446744071579557049 } hitcount: 1 bytes_req: 1024
721 { call_site: 18446744071580608289 } hitcount: 1 bytes_req: 16384
722 { call_site: 18446744071581827654 } hitcount: 1 bytes_req: 24
723 { call_site: 18446744071580700980 } hitcount: 1 bytes_req: 8
724 { call_site: 18446744071579359876 } hitcount: 1 bytes_req: 152
725 { call_site: 18446744071580795365 } hitcount: 3 bytes_req: 144
726 { call_site: 18446744071581303129 } hitcount: 3 bytes_req: 144
727 { call_site: 18446744071580713234 } hitcount: 4 bytes_req: 2560
728 { call_site: 18446744071580933750 } hitcount: 4 bytes_req: 736
729 .
730 .
731 .
732 { call_site: 18446744072106047046 } hitcount: 69 bytes_req: 5576
733 { call_site: 18446744071582116407 } hitcount: 73 bytes_req: 2336
734 { call_site: 18446744072106054684 } hitcount: 136 bytes_req: 140504
735 { call_site: 18446744072106224230 } hitcount: 136 bytes_req: 19584
736 { call_site: 18446744072106078074 } hitcount: 153 bytes_req: 2448
737 { call_site: 18446744072106062406 } hitcount: 153 bytes_req: 36720
738 { call_site: 18446744071582507929 } hitcount: 153 bytes_req: 37088
739 { call_site: 18446744072102520590 } hitcount: 273 bytes_req: 10920
740 { call_site: 18446744071582143559 } hitcount: 358 bytes_req: 716
741 { call_site: 18446744072106465852 } hitcount: 417 bytes_req: 56712
742 { call_site: 18446744072102523378 } hitcount: 485 bytes_req: 27160
743 { call_site: 18446744072099568646 } hitcount: 1676 bytes_req: 33520
744
745 Totals:
746 Hits: 4610
747 Entries: 45
748 Dropped: 0
749
750 The output displays a line for each entry, beginning with the key
751 specified in the trigger, followed by the value(s) also specified in
752 the trigger. At the beginning of the output is a line that displays
753 the trigger info, which can also be displayed by reading the
754 'trigger' file:
755
756 # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
757 hist:keys=call_site:vals=bytes_req:sort=hitcount:size=2048 [active]
758
759 At the end of the output are a few lines that display the overall
760 totals for the run. The 'Hits' field shows the total number of
761 times the event trigger was hit, the 'Entries' field shows the total
762 number of used entries in the hash table, and the 'Dropped' field
763 shows the number of hits that were dropped because the number of
764 used entries for the run exceeded the maximum number of entries
765 allowed for the table (normally 0, but if not a hint that you may
766 want to increase the size of the table using the 'size' parameter).
767
768 Notice in the above output that there's an extra field, 'hitcount',
769 which wasn't specified in the trigger. Also notice that in the
770 trigger info output, there's a parameter, 'sort=hitcount', which
771 wasn't specified in the trigger either. The reason for that is that
772 every trigger implicitly keeps a count of the total number of hits
773 attributed to a given entry, called the 'hitcount'. That hitcount
774 information is explicitly displayed in the output, and in the
775 absence of a user-specified sort parameter, is used as the default
776 sort field.
777
778 The value 'hitcount' can be used in place of an explicit value in
779 the 'values' parameter if you don't really need to have any
780 particular field summed and are mainly interested in hit
781 frequencies.
782
783 To turn the hist trigger off, simply call up the trigger in the
784 command history and re-execute it with a '!' prepended:
785
786 # echo '!hist:key=call_site:val=bytes_req' > \
787 /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
788
789 Finally, notice that the call_site as displayed in the output above
790 isn't really very useful. It's an address, but normally addresses
791 are displayed in hex. To have a numeric field displayed as a hex
792 value, simply append '.hex' to the field name in the trigger:
793
794 # echo 'hist:key=call_site.hex:val=bytes_req' > \
795 /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
796
797 # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist
798 # trigger info: hist:keys=call_site.hex:vals=bytes_req:sort=hitcount:size=2048 [active]
799
800 { call_site: ffffffffa026b291 } hitcount: 1 bytes_req: 433
801 { call_site: ffffffffa07186ff } hitcount: 1 bytes_req: 176
802 { call_site: ffffffff811ae721 } hitcount: 1 bytes_req: 16384
803 { call_site: ffffffff811c5134 } hitcount: 1 bytes_req: 8
804 { call_site: ffffffffa04a9ebb } hitcount: 1 bytes_req: 511
805 { call_site: ffffffff8122e0a6 } hitcount: 1 bytes_req: 12
806 { call_site: ffffffff8107da84 } hitcount: 1 bytes_req: 152
807 { call_site: ffffffff812d8246 } hitcount: 1 bytes_req: 24
808 { call_site: ffffffff811dc1e5 } hitcount: 3 bytes_req: 144
809 { call_site: ffffffffa02515e8 } hitcount: 3 bytes_req: 648
810 { call_site: ffffffff81258159 } hitcount: 3 bytes_req: 144
811 { call_site: ffffffff811c80f4 } hitcount: 4 bytes_req: 544
812 .
813 .
814 .
815 { call_site: ffffffffa06c7646 } hitcount: 106 bytes_req: 8024
816 { call_site: ffffffffa06cb246 } hitcount: 132 bytes_req: 31680
817 { call_site: ffffffffa06cef7a } hitcount: 132 bytes_req: 2112
818 { call_site: ffffffff8137e399 } hitcount: 132 bytes_req: 23232
819 { call_site: ffffffffa06c941c } hitcount: 185 bytes_req: 171360
820 { call_site: ffffffffa06f2a66 } hitcount: 185 bytes_req: 26640
821 { call_site: ffffffffa036a70e } hitcount: 265 bytes_req: 10600
822 { call_site: ffffffff81325447 } hitcount: 292 bytes_req: 584
823 { call_site: ffffffffa072da3c } hitcount: 446 bytes_req: 60656
824 { call_site: ffffffffa036b1f2 } hitcount: 526 bytes_req: 29456
825 { call_site: ffffffffa0099c06 } hitcount: 1780 bytes_req: 35600
826
827 Totals:
828 Hits: 4775
829 Entries: 46
830 Dropped: 0
831
832 Even that's only marginally more useful - while hex values do look
833 more like addresses, what users are typically more interested in
834 when looking at text addresses are the corresponding symbols
835 instead. To have an address displayed as symbolic value instead,
836 simply append '.sym' or '.sym-offset' to the field name in the
837 trigger:
838
839 # echo 'hist:key=call_site.sym:val=bytes_req' > \
840 /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
841
842 # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist
843 # trigger info: hist:keys=call_site.sym:vals=bytes_req:sort=hitcount:size=2048 [active]
844
845 { call_site: [ffffffff810adcb9] syslog_print_all } hitcount: 1 bytes_req: 1024
846 { call_site: [ffffffff8154bc62] usb_control_msg } hitcount: 1 bytes_req: 8
847 { call_site: [ffffffffa00bf6fe] hidraw_send_report [hid] } hitcount: 1 bytes_req: 7
848 { call_site: [ffffffff8154acbe] usb_alloc_urb } hitcount: 1 bytes_req: 192
849 { call_site: [ffffffffa00bf1ca] hidraw_report_event [hid] } hitcount: 1 bytes_req: 7
850 { call_site: [ffffffff811e3a25] __seq_open_private } hitcount: 1 bytes_req: 40
851 { call_site: [ffffffff8109524a] alloc_fair_sched_group } hitcount: 2 bytes_req: 128
852 { call_site: [ffffffff811febd5] fsnotify_alloc_group } hitcount: 2 bytes_req: 528
853 { call_site: [ffffffff81440f58] __tty_buffer_request_room } hitcount: 2 bytes_req: 2624
854 { call_site: [ffffffff81200ba6] inotify_new_group } hitcount: 2 bytes_req: 96
855 { call_site: [ffffffffa05e19af] ieee80211_start_tx_ba_session [mac80211] } hitcount: 2 bytes_req: 464
856 { call_site: [ffffffff81672406] tcp_get_metrics } hitcount: 2 bytes_req: 304
857 { call_site: [ffffffff81097ec2] alloc_rt_sched_group } hitcount: 2 bytes_req: 128
858 { call_site: [ffffffff81089b05] sched_create_group } hitcount: 2 bytes_req: 1424
859 .
860 .
861 .
862 { call_site: [ffffffffa04a580c] intel_crtc_page_flip [i915] } hitcount: 1185 bytes_req: 123240
863 { call_site: [ffffffffa0287592] drm_mode_page_flip_ioctl [drm] } hitcount: 1185 bytes_req: 104280
864 { call_site: [ffffffffa04c4a3c] intel_plane_duplicate_state [i915] } hitcount: 1402 bytes_req: 190672
865 { call_site: [ffffffff812891ca] ext4_find_extent } hitcount: 1518 bytes_req: 146208
866 { call_site: [ffffffffa029070e] drm_vma_node_allow [drm] } hitcount: 1746 bytes_req: 69840
867 { call_site: [ffffffffa045e7c4] i915_gem_do_execbuffer.isra.23 [i915] } hitcount: 2021 bytes_req: 792312
868 { call_site: [ffffffffa02911f2] drm_modeset_lock_crtc [drm] } hitcount: 2592 bytes_req: 145152
869 { call_site: [ffffffffa0489a66] intel_ring_begin [i915] } hitcount: 2629 bytes_req: 378576
870 { call_site: [ffffffffa046041c] i915_gem_execbuffer2 [i915] } hitcount: 2629 bytes_req: 3783248
871 { call_site: [ffffffff81325607] apparmor_file_alloc_security } hitcount: 5192 bytes_req: 10384
872 { call_site: [ffffffffa00b7c06] hid_report_raw_event [hid] } hitcount: 5529 bytes_req: 110584
873 { call_site: [ffffffff8131ebf7] aa_alloc_task_context } hitcount: 21943 bytes_req: 702176
874 { call_site: [ffffffff8125847d] ext4_htree_store_dirent } hitcount: 55759 bytes_req: 5074265
875
876 Totals:
877 Hits: 109928
878 Entries: 71
879 Dropped: 0
880
881 Because the default sort key above is 'hitcount', the above shows a
882 the list of call_sites by increasing hitcount, so that at the bottom
883 we see the functions that made the most kmalloc calls during the
884 run. If instead we we wanted to see the top kmalloc callers in
885 terms of the number of bytes requested rather than the number of
886 calls, and we wanted the top caller to appear at the top, we can use
887 the 'sort' parameter, along with the 'descending' modifier:
888
889 # echo 'hist:key=call_site.sym:val=bytes_req:sort=bytes_req.descending' > \
890 /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
891
892 # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist
893 # trigger info: hist:keys=call_site.sym:vals=bytes_req:sort=bytes_req.descending:size=2048 [active]
894
895 { call_site: [ffffffffa046041c] i915_gem_execbuffer2 [i915] } hitcount: 2186 bytes_req: 3397464
896 { call_site: [ffffffffa045e7c4] i915_gem_do_execbuffer.isra.23 [i915] } hitcount: 1790 bytes_req: 712176
897 { call_site: [ffffffff8125847d] ext4_htree_store_dirent } hitcount: 8132 bytes_req: 513135
898 { call_site: [ffffffff811e2a1b] seq_buf_alloc } hitcount: 106 bytes_req: 440128
899 { call_site: [ffffffffa0489a66] intel_ring_begin [i915] } hitcount: 2186 bytes_req: 314784
900 { call_site: [ffffffff812891ca] ext4_find_extent } hitcount: 2174 bytes_req: 208992
901 { call_site: [ffffffff811ae8e1] __kmalloc } hitcount: 8 bytes_req: 131072
902 { call_site: [ffffffffa04c4a3c] intel_plane_duplicate_state [i915] } hitcount: 859 bytes_req: 116824
903 { call_site: [ffffffffa02911f2] drm_modeset_lock_crtc [drm] } hitcount: 1834 bytes_req: 102704
904 { call_site: [ffffffffa04a580c] intel_crtc_page_flip [i915] } hitcount: 972 bytes_req: 101088
905 { call_site: [ffffffffa0287592] drm_mode_page_flip_ioctl [drm] } hitcount: 972 bytes_req: 85536
906 { call_site: [ffffffffa00b7c06] hid_report_raw_event [hid] } hitcount: 3333 bytes_req: 66664
907 { call_site: [ffffffff8137e559] sg_kmalloc } hitcount: 209 bytes_req: 61632
908 .
909 .
910 .
911 { call_site: [ffffffff81095225] alloc_fair_sched_group } hitcount: 2 bytes_req: 128
912 { call_site: [ffffffff81097ec2] alloc_rt_sched_group } hitcount: 2 bytes_req: 128
913 { call_site: [ffffffff812d8406] copy_semundo } hitcount: 2 bytes_req: 48
914 { call_site: [ffffffff81200ba6] inotify_new_group } hitcount: 1 bytes_req: 48
915 { call_site: [ffffffffa027121a] drm_getmagic [drm] } hitcount: 1 bytes_req: 48
916 { call_site: [ffffffff811e3a25] __seq_open_private } hitcount: 1 bytes_req: 40
917 { call_site: [ffffffff811c52f4] bprm_change_interp } hitcount: 2 bytes_req: 16
918 { call_site: [ffffffff8154bc62] usb_control_msg } hitcount: 1 bytes_req: 8
919 { call_site: [ffffffffa00bf1ca] hidraw_report_event [hid] } hitcount: 1 bytes_req: 7
920 { call_site: [ffffffffa00bf6fe] hidraw_send_report [hid] } hitcount: 1 bytes_req: 7
921
922 Totals:
923 Hits: 32133
924 Entries: 81
925 Dropped: 0
926
927 To display the offset and size information in addition to the symbol
928 name, just use 'sym-offset' instead:
929
930 # echo 'hist:key=call_site.sym-offset:val=bytes_req:sort=bytes_req.descending' > \
931 /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
932
933 # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist
934 # trigger info: hist:keys=call_site.sym-offset:vals=bytes_req:sort=bytes_req.descending:size=2048 [active]
935
936 { call_site: [ffffffffa046041c] i915_gem_execbuffer2+0x6c/0x2c0 [i915] } hitcount: 4569 bytes_req: 3163720
937 { call_site: [ffffffffa0489a66] intel_ring_begin+0xc6/0x1f0 [i915] } hitcount: 4569 bytes_req: 657936
938 { call_site: [ffffffffa045e7c4] i915_gem_do_execbuffer.isra.23+0x694/0x1020 [i915] } hitcount: 1519 bytes_req: 472936
939 { call_site: [ffffffffa045e646] i915_gem_do_execbuffer.isra.23+0x516/0x1020 [i915] } hitcount: 3050 bytes_req: 211832
940 { call_site: [ffffffff811e2a1b] seq_buf_alloc+0x1b/0x50 } hitcount: 34 bytes_req: 148384
941 { call_site: [ffffffffa04a580c] intel_crtc_page_flip+0xbc/0x870 [i915] } hitcount: 1385 bytes_req: 144040
942 { call_site: [ffffffff811ae8e1] __kmalloc+0x191/0x1b0 } hitcount: 8 bytes_req: 131072
943 { call_site: [ffffffffa0287592] drm_mode_page_flip_ioctl+0x282/0x360 [drm] } hitcount: 1385 bytes_req: 121880
944 { call_site: [ffffffffa02911f2] drm_modeset_lock_crtc+0x32/0x100 [drm] } hitcount: 1848 bytes_req: 103488
945 { call_site: [ffffffffa04c4a3c] intel_plane_duplicate_state+0x2c/0xa0 [i915] } hitcount: 461 bytes_req: 62696
946 { call_site: [ffffffffa029070e] drm_vma_node_allow+0x2e/0xd0 [drm] } hitcount: 1541 bytes_req: 61640
947 { call_site: [ffffffff815f8d7b] sk_prot_alloc+0xcb/0x1b0 } hitcount: 57 bytes_req: 57456
948 .
949 .
950 .
951 { call_site: [ffffffff8109524a] alloc_fair_sched_group+0x5a/0x1a0 } hitcount: 2 bytes_req: 128
952 { call_site: [ffffffffa027b921] drm_vm_open_locked+0x31/0xa0 [drm] } hitcount: 3 bytes_req: 96
953 { call_site: [ffffffff8122e266] proc_self_follow_link+0x76/0xb0 } hitcount: 8 bytes_req: 96
954 { call_site: [ffffffff81213e80] load_elf_binary+0x240/0x1650 } hitcount: 3 bytes_req: 84
955 { call_site: [ffffffff8154bc62] usb_control_msg+0x42/0x110 } hitcount: 1 bytes_req: 8
956 { call_site: [ffffffffa00bf6fe] hidraw_send_report+0x7e/0x1a0 [hid] } hitcount: 1 bytes_req: 7
957 { call_site: [ffffffffa00bf1ca] hidraw_report_event+0x8a/0x120 [hid] } hitcount: 1 bytes_req: 7
958
959 Totals:
960 Hits: 26098
961 Entries: 64
962 Dropped: 0
963
964 We can also add multiple fields to the 'values' parameter. For
965 example, we might want to see the total number of bytes allocated
966 alongside bytes requested, and display the result sorted by bytes
967 allocated in a descending order:
968
969 # echo 'hist:keys=call_site.sym:values=bytes_req,bytes_alloc:sort=bytes_alloc.descending' > \
970 /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
971
972 # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist
973 # trigger info: hist:keys=call_site.sym:vals=bytes_req,bytes_alloc:sort=bytes_alloc.descending:size=2048 [active]
974
975 { call_site: [ffffffffa046041c] i915_gem_execbuffer2 [i915] } hitcount: 7403 bytes_req: 4084360 bytes_alloc: 5958016
976 { call_site: [ffffffff811e2a1b] seq_buf_alloc } hitcount: 541 bytes_req: 2213968 bytes_alloc: 2228224
977 { call_site: [ffffffffa0489a66] intel_ring_begin [i915] } hitcount: 7404 bytes_req: 1066176 bytes_alloc: 1421568
978 { call_site: [ffffffffa045e7c4] i915_gem_do_execbuffer.isra.23 [i915] } hitcount: 1565 bytes_req: 557368 bytes_alloc: 1037760
979 { call_site: [ffffffff8125847d] ext4_htree_store_dirent } hitcount: 9557 bytes_req: 595778 bytes_alloc: 695744
980 { call_site: [ffffffffa045e646] i915_gem_do_execbuffer.isra.23 [i915] } hitcount: 5839 bytes_req: 430680 bytes_alloc: 470400
981 { call_site: [ffffffffa04c4a3c] intel_plane_duplicate_state [i915] } hitcount: 2388 bytes_req: 324768 bytes_alloc: 458496
982 { call_site: [ffffffffa02911f2] drm_modeset_lock_crtc [drm] } hitcount: 3911 bytes_req: 219016 bytes_alloc: 250304
983 { call_site: [ffffffff815f8d7b] sk_prot_alloc } hitcount: 235 bytes_req: 236880 bytes_alloc: 240640
984 { call_site: [ffffffff8137e559] sg_kmalloc } hitcount: 557 bytes_req: 169024 bytes_alloc: 221760
985 { call_site: [ffffffffa00b7c06] hid_report_raw_event [hid] } hitcount: 9378 bytes_req: 187548 bytes_alloc: 206312
986 { call_site: [ffffffffa04a580c] intel_crtc_page_flip [i915] } hitcount: 1519 bytes_req: 157976 bytes_alloc: 194432
987 .
988 .
989 .
990 { call_site: [ffffffff8109bd3b] sched_autogroup_create_attach } hitcount: 2 bytes_req: 144 bytes_alloc: 192
991 { call_site: [ffffffff81097ee8] alloc_rt_sched_group } hitcount: 2 bytes_req: 128 bytes_alloc: 128
992 { call_site: [ffffffff8109524a] alloc_fair_sched_group } hitcount: 2 bytes_req: 128 bytes_alloc: 128
993 { call_site: [ffffffff81095225] alloc_fair_sched_group } hitcount: 2 bytes_req: 128 bytes_alloc: 128
994 { call_site: [ffffffff81097ec2] alloc_rt_sched_group } hitcount: 2 bytes_req: 128 bytes_alloc: 128
995 { call_site: [ffffffff81213e80] load_elf_binary } hitcount: 3 bytes_req: 84 bytes_alloc: 96
996 { call_site: [ffffffff81079a2e] kthread_create_on_node } hitcount: 1 bytes_req: 56 bytes_alloc: 64
997 { call_site: [ffffffffa00bf6fe] hidraw_send_report [hid] } hitcount: 1 bytes_req: 7 bytes_alloc: 8
998 { call_site: [ffffffff8154bc62] usb_control_msg } hitcount: 1 bytes_req: 8 bytes_alloc: 8
999 { call_site: [ffffffffa00bf1ca] hidraw_report_event [hid] } hitcount: 1 bytes_req: 7 bytes_alloc: 8
1000
1001 Totals:
1002 Hits: 66598
1003 Entries: 65
1004 Dropped: 0
1005
1006 Finally, to finish off our kmalloc example, instead of simply having
1007 the hist trigger display symbolic call_sites, we can have the hist
1008 trigger additionally display the complete set of kernel stack traces
1009 that led to each call_site. To do that, we simply use the special
1010 value 'stacktrace' for the key parameter:
1011
1012 # echo 'hist:keys=stacktrace:values=bytes_req,bytes_alloc:sort=bytes_alloc' > \
1013 /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
1014
1015 The above trigger will use the kernel stack trace in effect when an
1016 event is triggered as the key for the hash table. This allows the
1017 enumeration of every kernel callpath that led up to a particular
1018 event, along with a running total of any of the event fields for
1019 that event. Here we tally bytes requested and bytes allocated for
1020 every callpath in the system that led up to a kmalloc (in this case
1021 every callpath to a kmalloc for a kernel compile):
1022
1023 # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist
1024 # trigger info: hist:keys=stacktrace:vals=bytes_req,bytes_alloc:sort=bytes_alloc:size=2048 [active]
1025
1026 { stacktrace:
1027 __kmalloc_track_caller+0x10b/0x1a0
1028 kmemdup+0x20/0x50
1029 hidraw_report_event+0x8a/0x120 [hid]
1030 hid_report_raw_event+0x3ea/0x440 [hid]
1031 hid_input_report+0x112/0x190 [hid]
1032 hid_irq_in+0xc2/0x260 [usbhid]
1033 __usb_hcd_giveback_urb+0x72/0x120
1034 usb_giveback_urb_bh+0x9e/0xe0
1035 tasklet_hi_action+0xf8/0x100
1036 __do_softirq+0x114/0x2c0
1037 irq_exit+0xa5/0xb0
1038 do_IRQ+0x5a/0xf0
1039 ret_from_intr+0x0/0x30
1040 cpuidle_enter+0x17/0x20
1041 cpu_startup_entry+0x315/0x3e0
1042 rest_init+0x7c/0x80
1043 } hitcount: 3 bytes_req: 21 bytes_alloc: 24
1044 { stacktrace:
1045 __kmalloc_track_caller+0x10b/0x1a0
1046 kmemdup+0x20/0x50
1047 hidraw_report_event+0x8a/0x120 [hid]
1048 hid_report_raw_event+0x3ea/0x440 [hid]
1049 hid_input_report+0x112/0x190 [hid]
1050 hid_irq_in+0xc2/0x260 [usbhid]
1051 __usb_hcd_giveback_urb+0x72/0x120
1052 usb_giveback_urb_bh+0x9e/0xe0
1053 tasklet_hi_action+0xf8/0x100
1054 __do_softirq+0x114/0x2c0
1055 irq_exit+0xa5/0xb0
1056 do_IRQ+0x5a/0xf0
1057 ret_from_intr+0x0/0x30
1058 } hitcount: 3 bytes_req: 21 bytes_alloc: 24
1059 { stacktrace:
1060 kmem_cache_alloc_trace+0xeb/0x150
1061 aa_alloc_task_context+0x27/0x40
1062 apparmor_cred_prepare+0x1f/0x50
1063 security_prepare_creds+0x16/0x20
1064 prepare_creds+0xdf/0x1a0
1065 SyS_capset+0xb5/0x200
1066 system_call_fastpath+0x12/0x6a
1067 } hitcount: 1 bytes_req: 32 bytes_alloc: 32
1068 .
1069 .
1070 .
1071 { stacktrace:
1072 __kmalloc+0x11b/0x1b0
1073 i915_gem_execbuffer2+0x6c/0x2c0 [i915]
1074 drm_ioctl+0x349/0x670 [drm]
1075 do_vfs_ioctl+0x2f0/0x4f0
1076 SyS_ioctl+0x81/0xa0
1077 system_call_fastpath+0x12/0x6a
1078 } hitcount: 17726 bytes_req: 13944120 bytes_alloc: 19593808
1079 { stacktrace:
1080 __kmalloc+0x11b/0x1b0
1081 load_elf_phdrs+0x76/0xa0
1082 load_elf_binary+0x102/0x1650
1083 search_binary_handler+0x97/0x1d0
1084 do_execveat_common.isra.34+0x551/0x6e0
1085 SyS_execve+0x3a/0x50
1086 return_from_execve+0x0/0x23
1087 } hitcount: 33348 bytes_req: 17152128 bytes_alloc: 20226048
1088 { stacktrace:
1089 kmem_cache_alloc_trace+0xeb/0x150
1090 apparmor_file_alloc_security+0x27/0x40
1091 security_file_alloc+0x16/0x20
1092 get_empty_filp+0x93/0x1c0
1093 path_openat+0x31/0x5f0
1094 do_filp_open+0x3a/0x90
1095 do_sys_open+0x128/0x220
1096 SyS_open+0x1e/0x20
1097 system_call_fastpath+0x12/0x6a
1098 } hitcount: 4766422 bytes_req: 9532844 bytes_alloc: 38131376
1099 { stacktrace:
1100 __kmalloc+0x11b/0x1b0
1101 seq_buf_alloc+0x1b/0x50
1102 seq_read+0x2cc/0x370
1103 proc_reg_read+0x3d/0x80
1104 __vfs_read+0x28/0xe0
1105 vfs_read+0x86/0x140
1106 SyS_read+0x46/0xb0
1107 system_call_fastpath+0x12/0x6a
1108 } hitcount: 19133 bytes_req: 78368768 bytes_alloc: 78368768
1109
1110 Totals:
1111 Hits: 6085872
1112 Entries: 253
1113 Dropped: 0
1114
1115 If you key a hist trigger on common_pid, in order for example to
1116 gather and display sorted totals for each process, you can use the
1117 special .execname modifier to display the executable names for the
1118 processes in the table rather than raw pids. The example below
1119 keeps a per-process sum of total bytes read:
1120
1121 # echo 'hist:key=common_pid.execname:val=count:sort=count.descending' > \
1122 /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger
1123
1124 # cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/hist
1125 # trigger info: hist:keys=common_pid.execname:vals=count:sort=count.descending:size=2048 [active]
1126
1127 { common_pid: gnome-terminal [ 3196] } hitcount: 280 count: 1093512
1128 { common_pid: Xorg [ 1309] } hitcount: 525 count: 256640
1129 { common_pid: compiz [ 2889] } hitcount: 59 count: 254400
1130 { common_pid: bash [ 8710] } hitcount: 3 count: 66369
1131 { common_pid: dbus-daemon-lau [ 8703] } hitcount: 49 count: 47739
1132 { common_pid: irqbalance [ 1252] } hitcount: 27 count: 27648
1133 { common_pid: 01ifupdown [ 8705] } hitcount: 3 count: 17216
1134 { common_pid: dbus-daemon [ 772] } hitcount: 10 count: 12396
1135 { common_pid: Socket Thread [ 8342] } hitcount: 11 count: 11264
1136 { common_pid: nm-dhcp-client. [ 8701] } hitcount: 6 count: 7424
1137 { common_pid: gmain [ 1315] } hitcount: 18 count: 6336
1138 .
1139 .
1140 .
1141 { common_pid: postgres [ 1892] } hitcount: 2 count: 32
1142 { common_pid: postgres [ 1891] } hitcount: 2 count: 32
1143 { common_pid: gmain [ 8704] } hitcount: 2 count: 32
1144 { common_pid: upstart-dbus-br [ 2740] } hitcount: 21 count: 21
1145 { common_pid: nm-dispatcher.a [ 8696] } hitcount: 1 count: 16
1146 { common_pid: indicator-datet [ 2904] } hitcount: 1 count: 16
1147 { common_pid: gdbus [ 2998] } hitcount: 1 count: 16
1148 { common_pid: rtkit-daemon [ 2052] } hitcount: 1 count: 8
1149 { common_pid: init [ 1] } hitcount: 2 count: 2
1150
1151 Totals:
1152 Hits: 2116
1153 Entries: 51
1154 Dropped: 0
1155
1156 Similarly, if you key a hist trigger on syscall id, for example to
1157 gather and display a list of systemwide syscall hits, you can use
1158 the special .syscall modifier to display the syscall names rather
1159 than raw ids. The example below keeps a running total of syscall
1160 counts for the system during the run:
1161
1162 # echo 'hist:key=id.syscall:val=hitcount' > \
1163 /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger
1164
1165 # cat /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/hist
1166 # trigger info: hist:keys=id.syscall:vals=hitcount:sort=hitcount:size=2048 [active]
1167
1168 { id: sys_fsync [ 74] } hitcount: 1
1169 { id: sys_newuname [ 63] } hitcount: 1
1170 { id: sys_prctl [157] } hitcount: 1
1171 { id: sys_statfs [137] } hitcount: 1
1172 { id: sys_symlink [ 88] } hitcount: 1
1173 { id: sys_sendmmsg [307] } hitcount: 1
1174 { id: sys_semctl [ 66] } hitcount: 1
1175 { id: sys_readlink [ 89] } hitcount: 3
1176 { id: sys_bind [ 49] } hitcount: 3
1177 { id: sys_getsockname [ 51] } hitcount: 3
1178 { id: sys_unlink [ 87] } hitcount: 3
1179 { id: sys_rename [ 82] } hitcount: 4
1180 { id: unknown_syscall [ 58] } hitcount: 4
1181 { id: sys_connect [ 42] } hitcount: 4
1182 { id: sys_getpid [ 39] } hitcount: 4
1183 .
1184 .
1185 .
1186 { id: sys_rt_sigprocmask [ 14] } hitcount: 952
1187 { id: sys_futex [202] } hitcount: 1534
1188 { id: sys_write [ 1] } hitcount: 2689
1189 { id: sys_setitimer [ 38] } hitcount: 2797
1190 { id: sys_read [ 0] } hitcount: 3202
1191 { id: sys_select [ 23] } hitcount: 3773
1192 { id: sys_writev [ 20] } hitcount: 4531
1193 { id: sys_poll [ 7] } hitcount: 8314
1194 { id: sys_recvmsg [ 47] } hitcount: 13738
1195 { id: sys_ioctl [ 16] } hitcount: 21843
1196
1197 Totals:
1198 Hits: 67612
1199 Entries: 72
1200 Dropped: 0
1201
1202 The syscall counts above provide a rough overall picture of system
1203 call activity on the system; we can see for example that the most
1204 popular system call on this system was the 'sys_ioctl' system call.
1205
1206 We can use 'compound' keys to refine that number and provide some
1207 further insight as to which processes exactly contribute to the
1208 overall ioctl count.
1209
1210 The command below keeps a hitcount for every unique combination of
1211 system call id and pid - the end result is essentially a table
1212 that keeps a per-pid sum of system call hits. The results are
1213 sorted using the system call id as the primary key, and the
1214 hitcount sum as the secondary key:
1215
1216 # echo 'hist:key=id.syscall,common_pid.execname:val=hitcount:sort=id,hitcount' > \
1217 /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger
1218
1219 # cat /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/hist
1220 # trigger info: hist:keys=id.syscall,common_pid.execname:vals=hitcount:sort=id.syscall,hitcount:size=2048 [active]
1221
1222 { id: sys_read [ 0], common_pid: rtkit-daemon [ 1877] } hitcount: 1
1223 { id: sys_read [ 0], common_pid: gdbus [ 2976] } hitcount: 1
1224 { id: sys_read [ 0], common_pid: console-kit-dae [ 3400] } hitcount: 1
1225 { id: sys_read [ 0], common_pid: postgres [ 1865] } hitcount: 1
1226 { id: sys_read [ 0], common_pid: deja-dup-monito [ 3543] } hitcount: 2
1227 { id: sys_read [ 0], common_pid: NetworkManager [ 890] } hitcount: 2
1228 { id: sys_read [ 0], common_pid: evolution-calen [ 3048] } hitcount: 2
1229 { id: sys_read [ 0], common_pid: postgres [ 1864] } hitcount: 2
1230 { id: sys_read [ 0], common_pid: nm-applet [ 3022] } hitcount: 2
1231 { id: sys_read [ 0], common_pid: whoopsie [ 1212] } hitcount: 2
1232 .
1233 .
1234 .
1235 { id: sys_ioctl [ 16], common_pid: bash [ 8479] } hitcount: 1
1236 { id: sys_ioctl [ 16], common_pid: bash [ 3472] } hitcount: 12
1237 { id: sys_ioctl [ 16], common_pid: gnome-terminal [ 3199] } hitcount: 16
1238 { id: sys_ioctl [ 16], common_pid: Xorg [ 1267] } hitcount: 1808
1239 { id: sys_ioctl [ 16], common_pid: compiz [ 2994] } hitcount: 5580
1240 .
1241 .
1242 .
1243 { id: sys_waitid [247], common_pid: upstart-dbus-br [ 2690] } hitcount: 3
1244 { id: sys_waitid [247], common_pid: upstart-dbus-br [ 2688] } hitcount: 16
1245 { id: sys_inotify_add_watch [254], common_pid: gmain [ 975] } hitcount: 2
1246 { id: sys_inotify_add_watch [254], common_pid: gmain [ 3204] } hitcount: 4
1247 { id: sys_inotify_add_watch [254], common_pid: gmain [ 2888] } hitcount: 4
1248 { id: sys_inotify_add_watch [254], common_pid: gmain [ 3003] } hitcount: 4
1249 { id: sys_inotify_add_watch [254], common_pid: gmain [ 2873] } hitcount: 4
1250 { id: sys_inotify_add_watch [254], common_pid: gmain [ 3196] } hitcount: 6
1251 { id: sys_openat [257], common_pid: java [ 2623] } hitcount: 2
1252 { id: sys_eventfd2 [290], common_pid: ibus-ui-gtk3 [ 2760] } hitcount: 4
1253 { id: sys_eventfd2 [290], common_pid: compiz [ 2994] } hitcount: 6
1254
1255 Totals:
1256 Hits: 31536
1257 Entries: 323
1258 Dropped: 0
1259
1260 The above list does give us a breakdown of the ioctl syscall by
1261 pid, but it also gives us quite a bit more than that, which we
1262 don't really care about at the moment. Since we know the syscall
1263 id for sys_ioctl (16, displayed next to the sys_ioctl name), we
1264 can use that to filter out all the other syscalls:
1265
1266 # echo 'hist:key=id.syscall,common_pid.execname:val=hitcount:sort=id,hitcount if id == 16' > \
1267 /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger
1268
1269 # cat /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/hist
1270 # trigger info: hist:keys=id.syscall,common_pid.execname:vals=hitcount:sort=id.syscall,hitcount:size=2048 if id == 16 [active]
1271
1272 { id: sys_ioctl [ 16], common_pid: gmain [ 2769] } hitcount: 1
1273 { id: sys_ioctl [ 16], common_pid: evolution-addre [ 8571] } hitcount: 1
1274 { id: sys_ioctl [ 16], common_pid: gmain [ 3003] } hitcount: 1
1275 { id: sys_ioctl [ 16], common_pid: gmain [ 2781] } hitcount: 1
1276 { id: sys_ioctl [ 16], common_pid: gmain [ 2829] } hitcount: 1
1277 { id: sys_ioctl [ 16], common_pid: bash [ 8726] } hitcount: 1
1278 { id: sys_ioctl [ 16], common_pid: bash [ 8508] } hitcount: 1
1279 { id: sys_ioctl [ 16], common_pid: gmain [ 2970] } hitcount: 1
1280 { id: sys_ioctl [ 16], common_pid: gmain [ 2768] } hitcount: 1
1281 .
1282 .
1283 .
1284 { id: sys_ioctl [ 16], common_pid: pool [ 8559] } hitcount: 45
1285 { id: sys_ioctl [ 16], common_pid: pool [ 8555] } hitcount: 48
1286 { id: sys_ioctl [ 16], common_pid: pool [ 8551] } hitcount: 48
1287 { id: sys_ioctl [ 16], common_pid: avahi-daemon [ 896] } hitcount: 66
1288 { id: sys_ioctl [ 16], common_pid: Xorg [ 1267] } hitcount: 26674
1289 { id: sys_ioctl [ 16], common_pid: compiz [ 2994] } hitcount: 73443
1290
1291 Totals:
1292 Hits: 101162
1293 Entries: 103
1294 Dropped: 0
1295
1296 The above output shows that 'compiz' and 'Xorg' are far and away
1297 the heaviest ioctl callers (which might lead to questions about
1298 whether they really need to be making all those calls and to
1299 possible avenues for further investigation.)
1300
1301 The compound key examples used a key and a sum value (hitcount) to
1302 sort the output, but we can just as easily use two keys instead.
1303 Here's an example where we use a compound key composed of the the
1304 common_pid and size event fields. Sorting with pid as the primary
1305 key and 'size' as the secondary key allows us to display an
1306 ordered summary of the recvfrom sizes, with counts, received by
1307 each process:
1308
1309 # echo 'hist:key=common_pid.execname,size:val=hitcount:sort=common_pid,size' > \
1310 /sys/kernel/debug/tracing/events/syscalls/sys_enter_recvfrom/trigger
1311
1312 # cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_recvfrom/hist
1313 # trigger info: hist:keys=common_pid.execname,size:vals=hitcount:sort=common_pid.execname,size:size=2048 [active]
1314
1315 { common_pid: smbd [ 784], size: 4 } hitcount: 1
1316 { common_pid: dnsmasq [ 1412], size: 4096 } hitcount: 672
1317 { common_pid: postgres [ 1796], size: 1000 } hitcount: 6
1318 { common_pid: postgres [ 1867], size: 1000 } hitcount: 10
1319 { common_pid: bamfdaemon [ 2787], size: 28 } hitcount: 2
1320 { common_pid: bamfdaemon [ 2787], size: 14360 } hitcount: 1
1321 { common_pid: compiz [ 2994], size: 8 } hitcount: 1
1322 { common_pid: compiz [ 2994], size: 20 } hitcount: 11
1323 { common_pid: gnome-terminal [ 3199], size: 4 } hitcount: 2
1324 { common_pid: firefox [ 8817], size: 4 } hitcount: 1
1325 { common_pid: firefox [ 8817], size: 8 } hitcount: 5
1326 { common_pid: firefox [ 8817], size: 588 } hitcount: 2
1327 { common_pid: firefox [ 8817], size: 628 } hitcount: 1
1328 { common_pid: firefox [ 8817], size: 6944 } hitcount: 1
1329 { common_pid: firefox [ 8817], size: 408880 } hitcount: 2
1330 { common_pid: firefox [ 8822], size: 8 } hitcount: 2
1331 { common_pid: firefox [ 8822], size: 160 } hitcount: 2
1332 { common_pid: firefox [ 8822], size: 320 } hitcount: 2
1333 { common_pid: firefox [ 8822], size: 352 } hitcount: 1
1334 .
1335 .
1336 .
1337 { common_pid: pool [ 8923], size: 1960 } hitcount: 10
1338 { common_pid: pool [ 8923], size: 2048 } hitcount: 10
1339 { common_pid: pool [ 8924], size: 1960 } hitcount: 10
1340 { common_pid: pool [ 8924], size: 2048 } hitcount: 10
1341 { common_pid: pool [ 8928], size: 1964 } hitcount: 4
1342 { common_pid: pool [ 8928], size: 1965 } hitcount: 2
1343 { common_pid: pool [ 8928], size: 2048 } hitcount: 6
1344 { common_pid: pool [ 8929], size: 1982 } hitcount: 1
1345 { common_pid: pool [ 8929], size: 2048 } hitcount: 1
1346
1347 Totals:
1348 Hits: 2016
1349 Entries: 224
1350 Dropped: 0
1351
1352 The above example also illustrates the fact that although a compound
1353 key is treated as a single entity for hashing purposes, the sub-keys
1354 it's composed of can be accessed independently.
1355
1356 The next example uses a string field as the hash key and
1357 demonstrates how you can manually pause and continue a hist trigger.
1358 In this example, we'll aggregate fork counts and don't expect a
1359 large number of entries in the hash table, so we'll drop it to a
1360 much smaller number, say 256:
1361
1362 # echo 'hist:key=child_comm:val=hitcount:size=256' > \
1363 /sys/kernel/debug/tracing/events/sched/sched_process_fork/trigger
1364
1365 # cat /sys/kernel/debug/tracing/events/sched/sched_process_fork/hist
1366 # trigger info: hist:keys=child_comm:vals=hitcount:sort=hitcount:size=256 [active]
1367
1368 { child_comm: dconf worker } hitcount: 1
1369 { child_comm: ibus-daemon } hitcount: 1
1370 { child_comm: whoopsie } hitcount: 1
1371 { child_comm: smbd } hitcount: 1
1372 { child_comm: gdbus } hitcount: 1
1373 { child_comm: kthreadd } hitcount: 1
1374 { child_comm: dconf worker } hitcount: 1
1375 { child_comm: evolution-alarm } hitcount: 2
1376 { child_comm: Socket Thread } hitcount: 2
1377 { child_comm: postgres } hitcount: 2
1378 { child_comm: bash } hitcount: 3
1379 { child_comm: compiz } hitcount: 3
1380 { child_comm: evolution-sourc } hitcount: 4
1381 { child_comm: dhclient } hitcount: 4
1382 { child_comm: pool } hitcount: 5
1383 { child_comm: nm-dispatcher.a } hitcount: 8
1384 { child_comm: firefox } hitcount: 8
1385 { child_comm: dbus-daemon } hitcount: 8
1386 { child_comm: glib-pacrunner } hitcount: 10
1387 { child_comm: evolution } hitcount: 23
1388
1389 Totals:
1390 Hits: 89
1391 Entries: 20
1392 Dropped: 0
1393
1394 If we want to pause the hist trigger, we can simply append :pause to
1395 the command that started the trigger. Notice that the trigger info
1396 displays as [paused]:
1397
1398 # echo 'hist:key=child_comm:val=hitcount:size=256:pause' >> \
1399 /sys/kernel/debug/tracing/events/sched/sched_process_fork/trigger
1400
1401 # cat /sys/kernel/debug/tracing/events/sched/sched_process_fork/hist
1402 # trigger info: hist:keys=child_comm:vals=hitcount:sort=hitcount:size=256 [paused]
1403
1404 { child_comm: dconf worker } hitcount: 1
1405 { child_comm: kthreadd } hitcount: 1
1406 { child_comm: dconf worker } hitcount: 1
1407 { child_comm: gdbus } hitcount: 1
1408 { child_comm: ibus-daemon } hitcount: 1
1409 { child_comm: Socket Thread } hitcount: 2
1410 { child_comm: evolution-alarm } hitcount: 2
1411 { child_comm: smbd } hitcount: 2
1412 { child_comm: bash } hitcount: 3
1413 { child_comm: whoopsie } hitcount: 3
1414 { child_comm: compiz } hitcount: 3
1415 { child_comm: evolution-sourc } hitcount: 4
1416 { child_comm: pool } hitcount: 5
1417 { child_comm: postgres } hitcount: 6
1418 { child_comm: firefox } hitcount: 8
1419 { child_comm: dhclient } hitcount: 10
1420 { child_comm: emacs } hitcount: 12
1421 { child_comm: dbus-daemon } hitcount: 20
1422 { child_comm: nm-dispatcher.a } hitcount: 20
1423 { child_comm: evolution } hitcount: 35
1424 { child_comm: glib-pacrunner } hitcount: 59
1425
1426 Totals:
1427 Hits: 199
1428 Entries: 21
1429 Dropped: 0
1430
1431 To manually continue having the trigger aggregate events, append
1432 :cont instead. Notice that the trigger info displays as [active]
1433 again, and the data has changed:
1434
1435 # echo 'hist:key=child_comm:val=hitcount:size=256:cont' >> \
1436 /sys/kernel/debug/tracing/events/sched/sched_process_fork/trigger
1437
1438 # cat /sys/kernel/debug/tracing/events/sched/sched_process_fork/hist
1439 # trigger info: hist:keys=child_comm:vals=hitcount:sort=hitcount:size=256 [active]
1440
1441 { child_comm: dconf worker } hitcount: 1
1442 { child_comm: dconf worker } hitcount: 1
1443 { child_comm: kthreadd } hitcount: 1
1444 { child_comm: gdbus } hitcount: 1
1445 { child_comm: ibus-daemon } hitcount: 1
1446 { child_comm: Socket Thread } hitcount: 2
1447 { child_comm: evolution-alarm } hitcount: 2
1448 { child_comm: smbd } hitcount: 2
1449 { child_comm: whoopsie } hitcount: 3
1450 { child_comm: compiz } hitcount: 3
1451 { child_comm: evolution-sourc } hitcount: 4
1452 { child_comm: bash } hitcount: 5
1453 { child_comm: pool } hitcount: 5
1454 { child_comm: postgres } hitcount: 6
1455 { child_comm: firefox } hitcount: 8
1456 { child_comm: dhclient } hitcount: 11
1457 { child_comm: emacs } hitcount: 12
1458 { child_comm: dbus-daemon } hitcount: 22
1459 { child_comm: nm-dispatcher.a } hitcount: 22
1460 { child_comm: evolution } hitcount: 35
1461 { child_comm: glib-pacrunner } hitcount: 59
1462
1463 Totals:
1464 Hits: 206
1465 Entries: 21
1466 Dropped: 0
1467
1468 The previous example showed how to start and stop a hist trigger by
1469 appending 'pause' and 'continue' to the hist trigger command. A
1470 hist trigger can also be started in a paused state by initially
1471 starting the trigger with ':pause' appended. This allows you to
1472 start the trigger only when you're ready to start collecting data
1473 and not before. For example, you could start the trigger in a
1474 paused state, then unpause it and do something you want to measure,
1475 then pause the trigger again when done.
1476
1477 Of course, doing this manually can be difficult and error-prone, but
1478 it is possible to automatically start and stop a hist trigger based
1479 on some condition, via the enable_hist and disable_hist triggers.
1480
1481 For example, suppose we wanted to take a look at the relative
1482 weights in terms of skb length for each callpath that leads to a
1483 netif_receieve_skb event when downloading a decent-sized file using
1484 wget.
1485
1486 First we set up an initially paused stacktrace trigger on the
1487 netif_receive_skb event:
1488
1489 # echo 'hist:key=stacktrace:vals=len:pause' > \
1490 /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger
1491
1492 Next, we set up an 'enable_hist' trigger on the sched_process_exec
1493 event, with an 'if filename==/usr/bin/wget' filter. The effect of
1494 this new trigger is that it will 'unpause' the hist trigger we just
1495 set up on netif_receive_skb if and only if it sees a
1496 sched_process_exec event with a filename of '/usr/bin/wget'. When
1497 that happens, all netif_receive_skb events are aggregated into a
1498 hash table keyed on stacktrace:
1499
1500 # echo 'enable_hist:net:netif_receive_skb if filename==/usr/bin/wget' > \
1501 /sys/kernel/debug/tracing/events/sched/sched_process_exec/trigger
1502
1503 The aggregation continues until the netif_receive_skb is paused
1504 again, which is what the following disable_hist event does by
1505 creating a similar setup on the sched_process_exit event, using the
1506 filter 'comm==wget':
1507
1508 # echo 'disable_hist:net:netif_receive_skb if comm==wget' > \
1509 /sys/kernel/debug/tracing/events/sched/sched_process_exit/trigger
1510
1511 Whenever a process exits and the comm field of the disable_hist
1512 trigger filter matches 'comm==wget', the netif_receive_skb hist
1513 trigger is disabled.
1514
1515 The overall effect is that netif_receive_skb events are aggregated
1516 into the hash table for only the duration of the wget. Executing a
1517 wget command and then listing the 'hist' file will display the
1518 output generated by the wget command:
1519
1520 $ wget https://www.kernel.org/pub/linux/kernel/v3.x/patch-3.19.xz
1521
1522 # cat /sys/kernel/debug/tracing/events/net/netif_receive_skb/hist
1523 # trigger info: hist:keys=stacktrace:vals=len:sort=hitcount:size=2048 [paused]
1524
1525 { stacktrace:
1526 __netif_receive_skb_core+0x46d/0x990
1527 __netif_receive_skb+0x18/0x60
1528 netif_receive_skb_internal+0x23/0x90
1529 napi_gro_receive+0xc8/0x100
1530 ieee80211_deliver_skb+0xd6/0x270 [mac80211]
1531 ieee80211_rx_handlers+0xccf/0x22f0 [mac80211]
1532 ieee80211_prepare_and_rx_handle+0x4e7/0xc40 [mac80211]
1533 ieee80211_rx+0x31d/0x900 [mac80211]
1534 iwlagn_rx_reply_rx+0x3db/0x6f0 [iwldvm]
1535 iwl_rx_dispatch+0x8e/0xf0 [iwldvm]
1536 iwl_pcie_irq_handler+0xe3c/0x12f0 [iwlwifi]
1537 irq_thread_fn+0x20/0x50
1538 irq_thread+0x11f/0x150
1539 kthread+0xd2/0xf0
1540 ret_from_fork+0x42/0x70
1541 } hitcount: 85 len: 28884
1542 { stacktrace:
1543 __netif_receive_skb_core+0x46d/0x990
1544 __netif_receive_skb+0x18/0x60
1545 netif_receive_skb_internal+0x23/0x90
1546 napi_gro_complete+0xa4/0xe0
1547 dev_gro_receive+0x23a/0x360
1548 napi_gro_receive+0x30/0x100
1549 ieee80211_deliver_skb+0xd6/0x270 [mac80211]
1550 ieee80211_rx_handlers+0xccf/0x22f0 [mac80211]
1551 ieee80211_prepare_and_rx_handle+0x4e7/0xc40 [mac80211]
1552 ieee80211_rx+0x31d/0x900 [mac80211]
1553 iwlagn_rx_reply_rx+0x3db/0x6f0 [iwldvm]
1554 iwl_rx_dispatch+0x8e/0xf0 [iwldvm]
1555 iwl_pcie_irq_handler+0xe3c/0x12f0 [iwlwifi]
1556 irq_thread_fn+0x20/0x50
1557 irq_thread+0x11f/0x150
1558 kthread+0xd2/0xf0
1559 } hitcount: 98 len: 664329
1560 { stacktrace:
1561 __netif_receive_skb_core+0x46d/0x990
1562 __netif_receive_skb+0x18/0x60
1563 process_backlog+0xa8/0x150
1564 net_rx_action+0x15d/0x340
1565 __do_softirq+0x114/0x2c0
1566 do_softirq_own_stack+0x1c/0x30
1567 do_softirq+0x65/0x70
1568 __local_bh_enable_ip+0xb5/0xc0
1569 ip_finish_output+0x1f4/0x840
1570 ip_output+0x6b/0xc0
1571 ip_local_out_sk+0x31/0x40
1572 ip_send_skb+0x1a/0x50
1573 udp_send_skb+0x173/0x2a0
1574 udp_sendmsg+0x2bf/0x9f0
1575 inet_sendmsg+0x64/0xa0
1576 sock_sendmsg+0x3d/0x50
1577 } hitcount: 115 len: 13030
1578 { stacktrace:
1579 __netif_receive_skb_core+0x46d/0x990
1580 __netif_receive_skb+0x18/0x60
1581 netif_receive_skb_internal+0x23/0x90
1582 napi_gro_complete+0xa4/0xe0
1583 napi_gro_flush+0x6d/0x90
1584 iwl_pcie_irq_handler+0x92a/0x12f0 [iwlwifi]
1585 irq_thread_fn+0x20/0x50
1586 irq_thread+0x11f/0x150
1587 kthread+0xd2/0xf0
1588 ret_from_fork+0x42/0x70
1589 } hitcount: 934 len: 5512212
1590
1591 Totals:
1592 Hits: 1232
1593 Entries: 4
1594 Dropped: 0
1595
1596 The above shows all the netif_receive_skb callpaths and their total
1597 lengths for the duration of the wget command.
1598
1599 The 'clear' hist trigger param can be used to clear the hash table.
1600 Suppose we wanted to try another run of the previous example but
1601 this time also wanted to see the complete list of events that went
1602 into the histogram. In order to avoid having to set everything up
1603 again, we can just clear the histogram first:
1604
1605 # echo 'hist:key=stacktrace:vals=len:clear' >> \
1606 /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger
1607
1608 Just to verify that it is in fact cleared, here's what we now see in
1609 the hist file:
1610
1611 # cat /sys/kernel/debug/tracing/events/net/netif_receive_skb/hist
1612 # trigger info: hist:keys=stacktrace:vals=len:sort=hitcount:size=2048 [paused]
1613
1614 Totals:
1615 Hits: 0
1616 Entries: 0
1617 Dropped: 0
1618
1619 Since we want to see the detailed list of every netif_receive_skb
1620 event occurring during the new run, which are in fact the same
1621 events being aggregated into the hash table, we add some additional
1622 'enable_event' events to the triggering sched_process_exec and
1623 sched_process_exit events as such:
1624
1625 # echo 'enable_event:net:netif_receive_skb if filename==/usr/bin/wget' > \
1626 /sys/kernel/debug/tracing/events/sched/sched_process_exec/trigger
1627
1628 # echo 'disable_event:net:netif_receive_skb if comm==wget' > \
1629 /sys/kernel/debug/tracing/events/sched/sched_process_exit/trigger
1630
1631 If you read the trigger files for the sched_process_exec and
1632 sched_process_exit triggers, you should see two triggers for each:
1633 one enabling/disabling the hist aggregation and the other
1634 enabling/disabling the logging of events:
1635
1636 # cat /sys/kernel/debug/tracing/events/sched/sched_process_exec/trigger
1637 enable_event:net:netif_receive_skb:unlimited if filename==/usr/bin/wget
1638 enable_hist:net:netif_receive_skb:unlimited if filename==/usr/bin/wget
1639
1640 # cat /sys/kernel/debug/tracing/events/sched/sched_process_exit/trigger
1641 enable_event:net:netif_receive_skb:unlimited if comm==wget
1642 disable_hist:net:netif_receive_skb:unlimited if comm==wget
1643
1644 In other words, whenever either of the sched_process_exec or
1645 sched_process_exit events is hit and matches 'wget', it enables or
1646 disables both the histogram and the event log, and what you end up
1647 with is a hash table and set of events just covering the specified
1648 duration. Run the wget command again:
1649
1650 $ wget https://www.kernel.org/pub/linux/kernel/v3.x/patch-3.19.xz
1651
1652 Displaying the 'hist' file should show something similar to what you
1653 saw in the last run, but this time you should also see the
1654 individual events in the trace file:
1655
1656 # cat /sys/kernel/debug/tracing/trace
1657
1658 # tracer: nop
1659 #
1660 # entries-in-buffer/entries-written: 183/1426 #P:4
1661 #
1662 # _-----=> irqs-off
1663 # / _----=> need-resched
1664 # | / _---=> hardirq/softirq
1665 # || / _--=> preempt-depth
1666 # ||| / delay
1667 # TASK-PID CPU# |||| TIMESTAMP FUNCTION
1668 # | | | |||| | |
1669 wget-15108 [000] ..s1 31769.606929: netif_receive_skb: dev=lo skbaddr=ffff88009c353100 len=60
1670 wget-15108 [000] ..s1 31769.606999: netif_receive_skb: dev=lo skbaddr=ffff88009c353200 len=60
1671 dnsmasq-1382 [000] ..s1 31769.677652: netif_receive_skb: dev=lo skbaddr=ffff88009c352b00 len=130
1672 dnsmasq-1382 [000] ..s1 31769.685917: netif_receive_skb: dev=lo skbaddr=ffff88009c352200 len=138
1673 ##### CPU 2 buffer started ####
1674 irq/29-iwlwifi-559 [002] ..s. 31772.031529: netif_receive_skb: dev=wlan0 skbaddr=ffff88009d433d00 len=2948
1675 irq/29-iwlwifi-559 [002] ..s. 31772.031572: netif_receive_skb: dev=wlan0 skbaddr=ffff88009d432200 len=1500
1676 irq/29-iwlwifi-559 [002] ..s. 31772.032196: netif_receive_skb: dev=wlan0 skbaddr=ffff88009d433100 len=2948
1677 irq/29-iwlwifi-559 [002] ..s. 31772.032761: netif_receive_skb: dev=wlan0 skbaddr=ffff88009d433000 len=2948
1678 irq/29-iwlwifi-559 [002] ..s. 31772.033220: netif_receive_skb: dev=wlan0 skbaddr=ffff88009d432e00 len=1500
1679 .
1680 .
1681 .
1682
1683 The following example demonstrates how multiple hist triggers can be
1684 attached to a given event. This capability can be useful for
1685 creating a set of different summaries derived from the same set of
1686 events, or for comparing the effects of different filters, among
1687 other things.
1688
1689 # echo 'hist:keys=skbaddr.hex:vals=len if len < 0' >> \
1690 /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger
1691 # echo 'hist:keys=skbaddr.hex:vals=len if len > 4096' >> \
1692 /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger
1693 # echo 'hist:keys=skbaddr.hex:vals=len if len == 256' >> \
1694 /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger
1695 # echo 'hist:keys=skbaddr.hex:vals=len' >> \
1696 /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger
1697 # echo 'hist:keys=len:vals=common_preempt_count' >> \
1698 /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger
1699
1700 The above set of commands create four triggers differing only in
1701 their filters, along with a completely different though fairly
1702 nonsensical trigger. Note that in order to append multiple hist
1703 triggers to the same file, you should use the '>>' operator to
1704 append them ('>' will also add the new hist trigger, but will remove
1705 any existing hist triggers beforehand).
1706
1707 Displaying the contents of the 'hist' file for the event shows the
1708 contents of all five histograms:
1709
1710 # cat /sys/kernel/debug/tracing/events/net/netif_receive_skb/hist
1711
1712 # event histogram
1713 #
1714 # trigger info: hist:keys=len:vals=hitcount,common_preempt_count:sort=hitcount:size=2048 [active]
1715 #
1716
1717 { len: 176 } hitcount: 1 common_preempt_count: 0
1718 { len: 223 } hitcount: 1 common_preempt_count: 0
1719 { len: 4854 } hitcount: 1 common_preempt_count: 0
1720 { len: 395 } hitcount: 1 common_preempt_count: 0
1721 { len: 177 } hitcount: 1 common_preempt_count: 0
1722 { len: 446 } hitcount: 1 common_preempt_count: 0
1723 { len: 1601 } hitcount: 1 common_preempt_count: 0
1724 .
1725 .
1726 .
1727 { len: 1280 } hitcount: 66 common_preempt_count: 0
1728 { len: 116 } hitcount: 81 common_preempt_count: 40
1729 { len: 708 } hitcount: 112 common_preempt_count: 0
1730 { len: 46 } hitcount: 221 common_preempt_count: 0
1731 { len: 1264 } hitcount: 458 common_preempt_count: 0
1732
1733 Totals:
1734 Hits: 1428
1735 Entries: 147
1736 Dropped: 0
1737
1738
1739 # event histogram
1740 #
1741 # trigger info: hist:keys=skbaddr.hex:vals=hitcount,len:sort=hitcount:size=2048 [active]
1742 #
1743
1744 { skbaddr: ffff8800baee5e00 } hitcount: 1 len: 130
1745 { skbaddr: ffff88005f3d5600 } hitcount: 1 len: 1280
1746 { skbaddr: ffff88005f3d4900 } hitcount: 1 len: 1280
1747 { skbaddr: ffff88009fed6300 } hitcount: 1 len: 115
1748 { skbaddr: ffff88009fe0ad00 } hitcount: 1 len: 115
1749 { skbaddr: ffff88008cdb1900 } hitcount: 1 len: 46
1750 { skbaddr: ffff880064b5ef00 } hitcount: 1 len: 118
1751 { skbaddr: ffff880044e3c700 } hitcount: 1 len: 60
1752 { skbaddr: ffff880100065900 } hitcount: 1 len: 46
1753 { skbaddr: ffff8800d46bd500 } hitcount: 1 len: 116
1754 { skbaddr: ffff88005f3d5f00 } hitcount: 1 len: 1280
1755 { skbaddr: ffff880100064700 } hitcount: 1 len: 365
1756 { skbaddr: ffff8800badb6f00 } hitcount: 1 len: 60
1757 .
1758 .
1759 .
1760 { skbaddr: ffff88009fe0be00 } hitcount: 27 len: 24677
1761 { skbaddr: ffff88009fe0a400 } hitcount: 27 len: 23052
1762 { skbaddr: ffff88009fe0b700 } hitcount: 31 len: 25589
1763 { skbaddr: ffff88009fe0b600 } hitcount: 32 len: 27326
1764 { skbaddr: ffff88006a462800 } hitcount: 68 len: 71678
1765 { skbaddr: ffff88006a463700 } hitcount: 70 len: 72678
1766 { skbaddr: ffff88006a462b00 } hitcount: 71 len: 77589
1767 { skbaddr: ffff88006a463600 } hitcount: 73 len: 71307
1768 { skbaddr: ffff88006a462200 } hitcount: 81 len: 81032
1769
1770 Totals:
1771 Hits: 1451
1772 Entries: 318
1773 Dropped: 0
1774
1775
1776 # event histogram
1777 #
1778 # trigger info: hist:keys=skbaddr.hex:vals=hitcount,len:sort=hitcount:size=2048 if len == 256 [active]
1779 #
1780
1781
1782 Totals:
1783 Hits: 0
1784 Entries: 0
1785 Dropped: 0
1786
1787
1788 # event histogram
1789 #
1790 # trigger info: hist:keys=skbaddr.hex:vals=hitcount,len:sort=hitcount:size=2048 if len > 4096 [active]
1791 #
1792
1793 { skbaddr: ffff88009fd2c300 } hitcount: 1 len: 7212
1794 { skbaddr: ffff8800d2bcce00 } hitcount: 1 len: 7212
1795 { skbaddr: ffff8800d2bcd700 } hitcount: 1 len: 7212
1796 { skbaddr: ffff8800d2bcda00 } hitcount: 1 len: 21492
1797 { skbaddr: ffff8800ae2e2d00 } hitcount: 1 len: 7212
1798 { skbaddr: ffff8800d2bcdb00 } hitcount: 1 len: 7212
1799 { skbaddr: ffff88006a4df500 } hitcount: 1 len: 4854
1800 { skbaddr: ffff88008ce47b00 } hitcount: 1 len: 18636
1801 { skbaddr: ffff8800ae2e2200 } hitcount: 1 len: 12924
1802 { skbaddr: ffff88005f3e1000 } hitcount: 1 len: 4356
1803 { skbaddr: ffff8800d2bcdc00 } hitcount: 2 len: 24420
1804 { skbaddr: ffff8800d2bcc200 } hitcount: 2 len: 12996
1805
1806 Totals:
1807 Hits: 14
1808 Entries: 12
1809 Dropped: 0
1810
1811
1812 # event histogram
1813 #
1814 # trigger info: hist:keys=skbaddr.hex:vals=hitcount,len:sort=hitcount:size=2048 if len < 0 [active]
1815 #
1816
1817
1818 Totals:
1819 Hits: 0
1820 Entries: 0
1821 Dropped: 0
1822
1823 Named triggers can be used to have triggers share a common set of
1824 histogram data. This capability is mostly useful for combining the
1825 output of events generated by tracepoints contained inside inline
1826 functions, but names can be used in a hist trigger on any event.
1827 For example, these two triggers when hit will update the same 'len'
1828 field in the shared 'foo' histogram data:
1829
1830 # echo 'hist:name=foo:keys=skbaddr.hex:vals=len' > \
1831 /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger
1832 # echo 'hist:name=foo:keys=skbaddr.hex:vals=len' > \
1833 /sys/kernel/debug/tracing/events/net/netif_rx/trigger
1834
1835 You can see that they're updating common histogram data by reading
1836 each event's hist files at the same time:
1837
1838 # cat /sys/kernel/debug/tracing/events/net/netif_receive_skb/hist;
1839 cat /sys/kernel/debug/tracing/events/net/netif_rx/hist
1840
1841 # event histogram
1842 #
1843 # trigger info: hist:name=foo:keys=skbaddr.hex:vals=hitcount,len:sort=hitcount:size=2048 [active]
1844 #
1845
1846 { skbaddr: ffff88000ad53500 } hitcount: 1 len: 46
1847 { skbaddr: ffff8800af5a1500 } hitcount: 1 len: 76
1848 { skbaddr: ffff8800d62a1900 } hitcount: 1 len: 46
1849 { skbaddr: ffff8800d2bccb00 } hitcount: 1 len: 468
1850 { skbaddr: ffff8800d3c69900 } hitcount: 1 len: 46
1851 { skbaddr: ffff88009ff09100 } hitcount: 1 len: 52
1852 { skbaddr: ffff88010f13ab00 } hitcount: 1 len: 168
1853 { skbaddr: ffff88006a54f400 } hitcount: 1 len: 46
1854 { skbaddr: ffff8800d2bcc500 } hitcount: 1 len: 260
1855 { skbaddr: ffff880064505000 } hitcount: 1 len: 46
1856 { skbaddr: ffff8800baf24e00 } hitcount: 1 len: 32
1857 { skbaddr: ffff88009fe0ad00 } hitcount: 1 len: 46
1858 { skbaddr: ffff8800d3edff00 } hitcount: 1 len: 44
1859 { skbaddr: ffff88009fe0b400 } hitcount: 1 len: 168
1860 { skbaddr: ffff8800a1c55a00 } hitcount: 1 len: 40
1861 { skbaddr: ffff8800d2bcd100 } hitcount: 1 len: 40
1862 { skbaddr: ffff880064505f00 } hitcount: 1 len: 174
1863 { skbaddr: ffff8800a8bff200 } hitcount: 1 len: 160
1864 { skbaddr: ffff880044e3cc00 } hitcount: 1 len: 76
1865 { skbaddr: ffff8800a8bfe700 } hitcount: 1 len: 46
1866 { skbaddr: ffff8800d2bcdc00 } hitcount: 1 len: 32
1867 { skbaddr: ffff8800a1f64800 } hitcount: 1 len: 46
1868 { skbaddr: ffff8800d2bcde00 } hitcount: 1 len: 988
1869 { skbaddr: ffff88006a5dea00 } hitcount: 1 len: 46
1870 { skbaddr: ffff88002e37a200 } hitcount: 1 len: 44
1871 { skbaddr: ffff8800a1f32c00 } hitcount: 2 len: 676
1872 { skbaddr: ffff88000ad52600 } hitcount: 2 len: 107
1873 { skbaddr: ffff8800a1f91e00 } hitcount: 2 len: 92
1874 { skbaddr: ffff8800af5a0200 } hitcount: 2 len: 142
1875 { skbaddr: ffff8800d2bcc600 } hitcount: 2 len: 220
1876 { skbaddr: ffff8800ba36f500 } hitcount: 2 len: 92
1877 { skbaddr: ffff8800d021f800 } hitcount: 2 len: 92
1878 { skbaddr: ffff8800a1f33600 } hitcount: 2 len: 675
1879 { skbaddr: ffff8800a8bfff00 } hitcount: 3 len: 138
1880 { skbaddr: ffff8800d62a1300 } hitcount: 3 len: 138
1881 { skbaddr: ffff88002e37a100 } hitcount: 4 len: 184
1882 { skbaddr: ffff880064504400 } hitcount: 4 len: 184
1883 { skbaddr: ffff8800a8bfec00 } hitcount: 4 len: 184
1884 { skbaddr: ffff88000ad53700 } hitcount: 5 len: 230
1885 { skbaddr: ffff8800d2bcdb00 } hitcount: 5 len: 196
1886 { skbaddr: ffff8800a1f90000 } hitcount: 6 len: 276
1887 { skbaddr: ffff88006a54f900 } hitcount: 6 len: 276
1888
1889 Totals:
1890 Hits: 81
1891 Entries: 42
1892 Dropped: 0
1893 # event histogram
1894 #
1895 # trigger info: hist:name=foo:keys=skbaddr.hex:vals=hitcount,len:sort=hitcount:size=2048 [active]
1896 #
1897
1898 { skbaddr: ffff88000ad53500 } hitcount: 1 len: 46
1899 { skbaddr: ffff8800af5a1500 } hitcount: 1 len: 76
1900 { skbaddr: ffff8800d62a1900 } hitcount: 1 len: 46
1901 { skbaddr: ffff8800d2bccb00 } hitcount: 1 len: 468
1902 { skbaddr: ffff8800d3c69900 } hitcount: 1 len: 46
1903 { skbaddr: ffff88009ff09100 } hitcount: 1 len: 52
1904 { skbaddr: ffff88010f13ab00 } hitcount: 1 len: 168
1905 { skbaddr: ffff88006a54f400 } hitcount: 1 len: 46
1906 { skbaddr: ffff8800d2bcc500 } hitcount: 1 len: 260
1907 { skbaddr: ffff880064505000 } hitcount: 1 len: 46
1908 { skbaddr: ffff8800baf24e00 } hitcount: 1 len: 32
1909 { skbaddr: ffff88009fe0ad00 } hitcount: 1 len: 46
1910 { skbaddr: ffff8800d3edff00 } hitcount: 1 len: 44
1911 { skbaddr: ffff88009fe0b400 } hitcount: 1 len: 168
1912 { skbaddr: ffff8800a1c55a00 } hitcount: 1 len: 40
1913 { skbaddr: ffff8800d2bcd100 } hitcount: 1 len: 40
1914 { skbaddr: ffff880064505f00 } hitcount: 1 len: 174
1915 { skbaddr: ffff8800a8bff200 } hitcount: 1 len: 160
1916 { skbaddr: ffff880044e3cc00 } hitcount: 1 len: 76
1917 { skbaddr: ffff8800a8bfe700 } hitcount: 1 len: 46
1918 { skbaddr: ffff8800d2bcdc00 } hitcount: 1 len: 32
1919 { skbaddr: ffff8800a1f64800 } hitcount: 1 len: 46
1920 { skbaddr: ffff8800d2bcde00 } hitcount: 1 len: 988
1921 { skbaddr: ffff88006a5dea00 } hitcount: 1 len: 46
1922 { skbaddr: ffff88002e37a200 } hitcount: 1 len: 44
1923 { skbaddr: ffff8800a1f32c00 } hitcount: 2 len: 676
1924 { skbaddr: ffff88000ad52600 } hitcount: 2 len: 107
1925 { skbaddr: ffff8800a1f91e00 } hitcount: 2 len: 92
1926 { skbaddr: ffff8800af5a0200 } hitcount: 2 len: 142
1927 { skbaddr: ffff8800d2bcc600 } hitcount: 2 len: 220
1928 { skbaddr: ffff8800ba36f500 } hitcount: 2 len: 92
1929 { skbaddr: ffff8800d021f800 } hitcount: 2 len: 92
1930 { skbaddr: ffff8800a1f33600 } hitcount: 2 len: 675
1931 { skbaddr: ffff8800a8bfff00 } hitcount: 3 len: 138
1932 { skbaddr: ffff8800d62a1300 } hitcount: 3 len: 138
1933 { skbaddr: ffff88002e37a100 } hitcount: 4 len: 184
1934 { skbaddr: ffff880064504400 } hitcount: 4 len: 184
1935 { skbaddr: ffff8800a8bfec00 } hitcount: 4 len: 184
1936 { skbaddr: ffff88000ad53700 } hitcount: 5 len: 230
1937 { skbaddr: ffff8800d2bcdb00 } hitcount: 5 len: 196
1938 { skbaddr: ffff8800a1f90000 } hitcount: 6 len: 276
1939 { skbaddr: ffff88006a54f900 } hitcount: 6 len: 276
1940
1941 Totals:
1942 Hits: 81
1943 Entries: 42
1944 Dropped: 0
1945
1946 And here's an example that shows how to combine histogram data from
1947 any two events even if they don't share any 'compatible' fields
1948 other than 'hitcount' and 'stacktrace'. These commands create a
1949 couple of triggers named 'bar' using those fields:
1950
1951 # echo 'hist:name=bar:key=stacktrace:val=hitcount' > \
1952 /sys/kernel/debug/tracing/events/sched/sched_process_fork/trigger
1953 # echo 'hist:name=bar:key=stacktrace:val=hitcount' > \
1954 /sys/kernel/debug/tracing/events/net/netif_rx/trigger
1955
1956 And displaying the output of either shows some interesting if
1957 somewhat confusing output:
1958
1959 # cat /sys/kernel/debug/tracing/events/sched/sched_process_fork/hist
1960 # cat /sys/kernel/debug/tracing/events/net/netif_rx/hist
1961
1962 # event histogram
1963 #
1964 # trigger info: hist:name=bar:keys=stacktrace:vals=hitcount:sort=hitcount:size=2048 [active]
1965 #
1966
1967 { stacktrace:
1968 _do_fork+0x18e/0x330
1969 kernel_thread+0x29/0x30
1970 kthreadd+0x154/0x1b0
1971 ret_from_fork+0x3f/0x70
1972 } hitcount: 1
1973 { stacktrace:
1974 netif_rx_internal+0xb2/0xd0
1975 netif_rx_ni+0x20/0x70
1976 dev_loopback_xmit+0xaa/0xd0
1977 ip_mc_output+0x126/0x240
1978 ip_local_out_sk+0x31/0x40
1979 igmp_send_report+0x1e9/0x230
1980 igmp_timer_expire+0xe9/0x120
1981 call_timer_fn+0x39/0xf0
1982 run_timer_softirq+0x1e1/0x290
1983 __do_softirq+0xfd/0x290
1984 irq_exit+0x98/0xb0
1985 smp_apic_timer_interrupt+0x4a/0x60
1986 apic_timer_interrupt+0x6d/0x80
1987 cpuidle_enter+0x17/0x20
1988 call_cpuidle+0x3b/0x60
1989 cpu_startup_entry+0x22d/0x310
1990 } hitcount: 1
1991 { stacktrace:
1992 netif_rx_internal+0xb2/0xd0
1993 netif_rx_ni+0x20/0x70
1994 dev_loopback_xmit+0xaa/0xd0
1995 ip_mc_output+0x17f/0x240
1996 ip_local_out_sk+0x31/0x40
1997 ip_send_skb+0x1a/0x50
1998 udp_send_skb+0x13e/0x270
1999 udp_sendmsg+0x2bf/0x980
2000 inet_sendmsg+0x67/0xa0
2001 sock_sendmsg+0x38/0x50
2002 SYSC_sendto+0xef/0x170
2003 SyS_sendto+0xe/0x10
2004 entry_SYSCALL_64_fastpath+0x12/0x6a
2005 } hitcount: 2
2006 { stacktrace:
2007 netif_rx_internal+0xb2/0xd0
2008 netif_rx+0x1c/0x60
2009 loopback_xmit+0x6c/0xb0
2010 dev_hard_start_xmit+0x219/0x3a0
2011 __dev_queue_xmit+0x415/0x4f0
2012 dev_queue_xmit_sk+0x13/0x20
2013 ip_finish_output2+0x237/0x340
2014 ip_finish_output+0x113/0x1d0
2015 ip_output+0x66/0xc0
2016 ip_local_out_sk+0x31/0x40
2017 ip_send_skb+0x1a/0x50
2018 udp_send_skb+0x16d/0x270
2019 udp_sendmsg+0x2bf/0x980
2020 inet_sendmsg+0x67/0xa0
2021 sock_sendmsg+0x38/0x50
2022 ___sys_sendmsg+0x14e/0x270
2023 } hitcount: 76
2024 { stacktrace:
2025 netif_rx_internal+0xb2/0xd0
2026 netif_rx+0x1c/0x60
2027 loopback_xmit+0x6c/0xb0
2028 dev_hard_start_xmit+0x219/0x3a0
2029 __dev_queue_xmit+0x415/0x4f0
2030 dev_queue_xmit_sk+0x13/0x20
2031 ip_finish_output2+0x237/0x340
2032 ip_finish_output+0x113/0x1d0
2033 ip_output+0x66/0xc0
2034 ip_local_out_sk+0x31/0x40
2035 ip_send_skb+0x1a/0x50
2036 udp_send_skb+0x16d/0x270
2037 udp_sendmsg+0x2bf/0x980
2038 inet_sendmsg+0x67/0xa0
2039 sock_sendmsg+0x38/0x50
2040 ___sys_sendmsg+0x269/0x270
2041 } hitcount: 77
2042 { stacktrace:
2043 netif_rx_internal+0xb2/0xd0
2044 netif_rx+0x1c/0x60
2045 loopback_xmit+0x6c/0xb0
2046 dev_hard_start_xmit+0x219/0x3a0
2047 __dev_queue_xmit+0x415/0x4f0
2048 dev_queue_xmit_sk+0x13/0x20
2049 ip_finish_output2+0x237/0x340
2050 ip_finish_output+0x113/0x1d0
2051 ip_output+0x66/0xc0
2052 ip_local_out_sk+0x31/0x40
2053 ip_send_skb+0x1a/0x50
2054 udp_send_skb+0x16d/0x270
2055 udp_sendmsg+0x2bf/0x980
2056 inet_sendmsg+0x67/0xa0
2057 sock_sendmsg+0x38/0x50
2058 SYSC_sendto+0xef/0x170
2059 } hitcount: 88
2060 { stacktrace:
2061 _do_fork+0x18e/0x330
2062 SyS_clone+0x19/0x20
2063 entry_SYSCALL_64_fastpath+0x12/0x6a
2064 } hitcount: 244
2065
2066 Totals:
2067 Hits: 489
2068 Entries: 7
2069 Dropped: 0
diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt
index 9857606dd7b7..a6b3705e62a6 100644
--- a/Documentation/trace/ftrace.txt
+++ b/Documentation/trace/ftrace.txt
@@ -210,6 +210,11 @@ of ftrace. Here is a list of some of the key files:
210 Note, sched_switch and sched_wake_up will also trace events 210 Note, sched_switch and sched_wake_up will also trace events
211 listed in this file. 211 listed in this file.
212 212
213 To have the PIDs of children of tasks with their PID in this file
214 added on fork, enable the "event-fork" option. That option will also
215 cause the PIDs of tasks to be removed from this file when the task
216 exits.
217
213 set_graph_function: 218 set_graph_function:
214 219
215 Set a "trigger" function where tracing should start 220 Set a "trigger" function where tracing should start
@@ -725,16 +730,14 @@ noraw
725nohex 730nohex
726nobin 731nobin
727noblock 732noblock
728nostacktrace
729trace_printk 733trace_printk
730noftrace_preempt
731nobranch 734nobranch
732annotate 735annotate
733nouserstacktrace 736nouserstacktrace
734nosym-userobj 737nosym-userobj
735noprintk-msg-only 738noprintk-msg-only
736context-info 739context-info
737latency-format 740nolatency-format
738sleep-time 741sleep-time
739graph-time 742graph-time
740record-cmd 743record-cmd
@@ -742,7 +745,10 @@ overwrite
742nodisable_on_free 745nodisable_on_free
743irq-info 746irq-info
744markers 747markers
748noevent-fork
745function-trace 749function-trace
750nodisplay-graph
751nostacktrace
746 752
747To disable one of the options, echo in the option prepended with 753To disable one of the options, echo in the option prepended with
748"no". 754"no".
@@ -796,11 +802,6 @@ Here are the available options:
796 802
797 block - When set, reading trace_pipe will not block when polled. 803 block - When set, reading trace_pipe will not block when polled.
798 804
799 stacktrace - This is one of the options that changes the trace
800 itself. When a trace is recorded, so is the stack
801 of functions. This allows for back traces of
802 trace sites.
803
804 trace_printk - Can disable trace_printk() from writing into the buffer. 805 trace_printk - Can disable trace_printk() from writing into the buffer.
805 806
806 branch - Enable branch tracing with the tracer. 807 branch - Enable branch tracing with the tracer.
@@ -897,6 +898,10 @@ x494] <- /root/a.out[+0x4a8] <- /lib/libc-2.7.so[+0x1e1a6]
897 When disabled, the trace_marker will error with EINVAL 898 When disabled, the trace_marker will error with EINVAL
898 on write. 899 on write.
899 900
901 event-fork - When set, tasks with PIDs listed in set_event_pid will have
902 the PIDs of their children added to set_event_pid when those
903 tasks fork. Also, when tasks with PIDs in set_event_pid exit,
904 their PIDs will be removed from the file.
900 905
901 function-trace - The latency tracers will enable function tracing 906 function-trace - The latency tracers will enable function tracing
902 if this option is enabled (default it is). When 907 if this option is enabled (default it is). When
@@ -904,8 +909,17 @@ x494] <- /root/a.out[+0x4a8] <- /lib/libc-2.7.so[+0x1e1a6]
904 functions. This keeps the overhead of the tracer down 909 functions. This keeps the overhead of the tracer down
905 when performing latency tests. 910 when performing latency tests.
906 911
907 Note: Some tracers have their own options. They only appear 912 display-graph - When set, the latency tracers (irqsoff, wakeup, etc) will
908 when the tracer is active. 913 use function graph tracing instead of function tracing.
914
915 stacktrace - This is one of the options that changes the trace
916 itself. When a trace is recorded, so is the stack
917 of functions. This allows for back traces of
918 trace sites.
919
920 Note: Some tracers have their own options. They only appear in this
921 file when the tracer is active. They always appear in the
922 options directory.
909 923
910 924
911 925
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 222f6aa0418f..be007610ceb0 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -154,21 +154,6 @@ trace_event_buffer_lock_reserve(struct ring_buffer **current_buffer,
154 struct trace_event_file *trace_file, 154 struct trace_event_file *trace_file,
155 int type, unsigned long len, 155 int type, unsigned long len,
156 unsigned long flags, int pc); 156 unsigned long flags, int pc);
157struct ring_buffer_event *
158trace_current_buffer_lock_reserve(struct ring_buffer **current_buffer,
159 int type, unsigned long len,
160 unsigned long flags, int pc);
161void trace_buffer_unlock_commit(struct trace_array *tr,
162 struct ring_buffer *buffer,
163 struct ring_buffer_event *event,
164 unsigned long flags, int pc);
165void trace_buffer_unlock_commit_regs(struct trace_array *tr,
166 struct ring_buffer *buffer,
167 struct ring_buffer_event *event,
168 unsigned long flags, int pc,
169 struct pt_regs *regs);
170void trace_current_buffer_discard_commit(struct ring_buffer *buffer,
171 struct ring_buffer_event *event);
172 157
173void tracing_record_cmdline(struct task_struct *tsk); 158void tracing_record_cmdline(struct task_struct *tsk);
174 159
@@ -229,7 +214,6 @@ enum {
229 TRACE_EVENT_FL_NO_SET_FILTER_BIT, 214 TRACE_EVENT_FL_NO_SET_FILTER_BIT,
230 TRACE_EVENT_FL_IGNORE_ENABLE_BIT, 215 TRACE_EVENT_FL_IGNORE_ENABLE_BIT,
231 TRACE_EVENT_FL_WAS_ENABLED_BIT, 216 TRACE_EVENT_FL_WAS_ENABLED_BIT,
232 TRACE_EVENT_FL_USE_CALL_FILTER_BIT,
233 TRACE_EVENT_FL_TRACEPOINT_BIT, 217 TRACE_EVENT_FL_TRACEPOINT_BIT,
234 TRACE_EVENT_FL_KPROBE_BIT, 218 TRACE_EVENT_FL_KPROBE_BIT,
235 TRACE_EVENT_FL_UPROBE_BIT, 219 TRACE_EVENT_FL_UPROBE_BIT,
@@ -244,7 +228,6 @@ enum {
244 * WAS_ENABLED - Set and stays set when an event was ever enabled 228 * WAS_ENABLED - Set and stays set when an event was ever enabled
245 * (used for module unloading, if a module event is enabled, 229 * (used for module unloading, if a module event is enabled,
246 * it is best to clear the buffers that used it). 230 * it is best to clear the buffers that used it).
247 * USE_CALL_FILTER - For trace internal events, don't use file filter
248 * TRACEPOINT - Event is a tracepoint 231 * TRACEPOINT - Event is a tracepoint
249 * KPROBE - Event is a kprobe 232 * KPROBE - Event is a kprobe
250 * UPROBE - Event is a uprobe 233 * UPROBE - Event is a uprobe
@@ -255,7 +238,6 @@ enum {
255 TRACE_EVENT_FL_NO_SET_FILTER = (1 << TRACE_EVENT_FL_NO_SET_FILTER_BIT), 238 TRACE_EVENT_FL_NO_SET_FILTER = (1 << TRACE_EVENT_FL_NO_SET_FILTER_BIT),
256 TRACE_EVENT_FL_IGNORE_ENABLE = (1 << TRACE_EVENT_FL_IGNORE_ENABLE_BIT), 239 TRACE_EVENT_FL_IGNORE_ENABLE = (1 << TRACE_EVENT_FL_IGNORE_ENABLE_BIT),
257 TRACE_EVENT_FL_WAS_ENABLED = (1 << TRACE_EVENT_FL_WAS_ENABLED_BIT), 240 TRACE_EVENT_FL_WAS_ENABLED = (1 << TRACE_EVENT_FL_WAS_ENABLED_BIT),
258 TRACE_EVENT_FL_USE_CALL_FILTER = (1 << TRACE_EVENT_FL_USE_CALL_FILTER_BIT),
259 TRACE_EVENT_FL_TRACEPOINT = (1 << TRACE_EVENT_FL_TRACEPOINT_BIT), 241 TRACE_EVENT_FL_TRACEPOINT = (1 << TRACE_EVENT_FL_TRACEPOINT_BIT),
260 TRACE_EVENT_FL_KPROBE = (1 << TRACE_EVENT_FL_KPROBE_BIT), 242 TRACE_EVENT_FL_KPROBE = (1 << TRACE_EVENT_FL_KPROBE_BIT),
261 TRACE_EVENT_FL_UPROBE = (1 << TRACE_EVENT_FL_UPROBE_BIT), 243 TRACE_EVENT_FL_UPROBE = (1 << TRACE_EVENT_FL_UPROBE_BIT),
@@ -407,16 +389,12 @@ enum event_trigger_type {
407 ETT_SNAPSHOT = (1 << 1), 389 ETT_SNAPSHOT = (1 << 1),
408 ETT_STACKTRACE = (1 << 2), 390 ETT_STACKTRACE = (1 << 2),
409 ETT_EVENT_ENABLE = (1 << 3), 391 ETT_EVENT_ENABLE = (1 << 3),
392 ETT_EVENT_HIST = (1 << 4),
393 ETT_HIST_ENABLE = (1 << 5),
410}; 394};
411 395
412extern int filter_match_preds(struct event_filter *filter, void *rec); 396extern int filter_match_preds(struct event_filter *filter, void *rec);
413 397
414extern int filter_check_discard(struct trace_event_file *file, void *rec,
415 struct ring_buffer *buffer,
416 struct ring_buffer_event *event);
417extern int call_filter_check_discard(struct trace_event_call *call, void *rec,
418 struct ring_buffer *buffer,
419 struct ring_buffer_event *event);
420extern enum event_trigger_type event_triggers_call(struct trace_event_file *file, 398extern enum event_trigger_type event_triggers_call(struct trace_event_file *file,
421 void *rec); 399 void *rec);
422extern void event_triggers_post_call(struct trace_event_file *file, 400extern void event_triggers_post_call(struct trace_event_file *file,
@@ -450,100 +428,6 @@ trace_trigger_soft_disabled(struct trace_event_file *file)
450 return false; 428 return false;
451} 429}
452 430
453/*
454 * Helper function for event_trigger_unlock_commit{_regs}().
455 * If there are event triggers attached to this event that requires
456 * filtering against its fields, then they wil be called as the
457 * entry already holds the field information of the current event.
458 *
459 * It also checks if the event should be discarded or not.
460 * It is to be discarded if the event is soft disabled and the
461 * event was only recorded to process triggers, or if the event
462 * filter is active and this event did not match the filters.
463 *
464 * Returns true if the event is discarded, false otherwise.
465 */
466static inline bool
467__event_trigger_test_discard(struct trace_event_file *file,
468 struct ring_buffer *buffer,
469 struct ring_buffer_event *event,
470 void *entry,
471 enum event_trigger_type *tt)
472{
473 unsigned long eflags = file->flags;
474
475 if (eflags & EVENT_FILE_FL_TRIGGER_COND)
476 *tt = event_triggers_call(file, entry);
477
478 if (test_bit(EVENT_FILE_FL_SOFT_DISABLED_BIT, &file->flags))
479 ring_buffer_discard_commit(buffer, event);
480 else if (!filter_check_discard(file, entry, buffer, event))
481 return false;
482
483 return true;
484}
485
486/**
487 * event_trigger_unlock_commit - handle triggers and finish event commit
488 * @file: The file pointer assoctiated to the event
489 * @buffer: The ring buffer that the event is being written to
490 * @event: The event meta data in the ring buffer
491 * @entry: The event itself
492 * @irq_flags: The state of the interrupts at the start of the event
493 * @pc: The state of the preempt count at the start of the event.
494 *
495 * This is a helper function to handle triggers that require data
496 * from the event itself. It also tests the event against filters and
497 * if the event is soft disabled and should be discarded.
498 */
499static inline void
500event_trigger_unlock_commit(struct trace_event_file *file,
501 struct ring_buffer *buffer,
502 struct ring_buffer_event *event,
503 void *entry, unsigned long irq_flags, int pc)
504{
505 enum event_trigger_type tt = ETT_NONE;
506
507 if (!__event_trigger_test_discard(file, buffer, event, entry, &tt))
508 trace_buffer_unlock_commit(file->tr, buffer, event, irq_flags, pc);
509
510 if (tt)
511 event_triggers_post_call(file, tt, entry);
512}
513
514/**
515 * event_trigger_unlock_commit_regs - handle triggers and finish event commit
516 * @file: The file pointer assoctiated to the event
517 * @buffer: The ring buffer that the event is being written to
518 * @event: The event meta data in the ring buffer
519 * @entry: The event itself
520 * @irq_flags: The state of the interrupts at the start of the event
521 * @pc: The state of the preempt count at the start of the event.
522 *
523 * This is a helper function to handle triggers that require data
524 * from the event itself. It also tests the event against filters and
525 * if the event is soft disabled and should be discarded.
526 *
527 * Same as event_trigger_unlock_commit() but calls
528 * trace_buffer_unlock_commit_regs() instead of trace_buffer_unlock_commit().
529 */
530static inline void
531event_trigger_unlock_commit_regs(struct trace_event_file *file,
532 struct ring_buffer *buffer,
533 struct ring_buffer_event *event,
534 void *entry, unsigned long irq_flags, int pc,
535 struct pt_regs *regs)
536{
537 enum event_trigger_type tt = ETT_NONE;
538
539 if (!__event_trigger_test_discard(file, buffer, event, entry, &tt))
540 trace_buffer_unlock_commit_regs(file->tr, buffer, event,
541 irq_flags, pc, regs);
542
543 if (tt)
544 event_triggers_post_call(file, tt, entry);
545}
546
547#ifdef CONFIG_BPF_EVENTS 431#ifdef CONFIG_BPF_EVENTS
548unsigned int trace_call_bpf(struct bpf_prog *prog, void *ctx); 432unsigned int trace_call_bpf(struct bpf_prog *prog, void *ctx);
549#else 433#else
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index e45db6b0d878..fafeaf803bd0 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -528,6 +528,32 @@ config MMIOTRACE
528 See Documentation/trace/mmiotrace.txt. 528 See Documentation/trace/mmiotrace.txt.
529 If you are not helping to develop drivers, say N. 529 If you are not helping to develop drivers, say N.
530 530
531config TRACING_MAP
532 bool
533 depends on ARCH_HAVE_NMI_SAFE_CMPXCHG
534 help
535 tracing_map is a special-purpose lock-free map for tracing,
536 separated out as a stand-alone facility in order to allow it
537 to be shared between multiple tracers. It isn't meant to be
538 generally used outside of that context, and is normally
539 selected by tracers that use it.
540
541config HIST_TRIGGERS
542 bool "Histogram triggers"
543 depends on ARCH_HAVE_NMI_SAFE_CMPXCHG
544 select TRACING_MAP
545 default n
546 help
547 Hist triggers allow one or more arbitrary trace event fields
548 to be aggregated into hash tables and dumped to stdout by
549 reading a debugfs/tracefs file. They're useful for
550 gathering quick and dirty (though precise) summaries of
551 event activity as an initial guide for further investigation
552 using more advanced tools.
553
554 See Documentation/trace/events.txt.
555 If in doubt, say N.
556
531config MMIOTRACE_TEST 557config MMIOTRACE_TEST
532 tristate "Test module for mmiotrace" 558 tristate "Test module for mmiotrace"
533 depends on MMIOTRACE && m 559 depends on MMIOTRACE && m
diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index 9b1044e936a6..979e7bfbde7a 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -31,6 +31,7 @@ obj-$(CONFIG_TRACING) += trace_output.o
31obj-$(CONFIG_TRACING) += trace_seq.o 31obj-$(CONFIG_TRACING) += trace_seq.o
32obj-$(CONFIG_TRACING) += trace_stat.o 32obj-$(CONFIG_TRACING) += trace_stat.o
33obj-$(CONFIG_TRACING) += trace_printk.o 33obj-$(CONFIG_TRACING) += trace_printk.o
34obj-$(CONFIG_TRACING_MAP) += tracing_map.o
34obj-$(CONFIG_CONTEXT_SWITCH_TRACER) += trace_sched_switch.o 35obj-$(CONFIG_CONTEXT_SWITCH_TRACER) += trace_sched_switch.o
35obj-$(CONFIG_FUNCTION_TRACER) += trace_functions.o 36obj-$(CONFIG_FUNCTION_TRACER) += trace_functions.o
36obj-$(CONFIG_IRQSOFF_TRACER) += trace_irqsoff.o 37obj-$(CONFIG_IRQSOFF_TRACER) += trace_irqsoff.o
@@ -53,6 +54,7 @@ obj-$(CONFIG_EVENT_TRACING) += trace_event_perf.o
53endif 54endif
54obj-$(CONFIG_EVENT_TRACING) += trace_events_filter.o 55obj-$(CONFIG_EVENT_TRACING) += trace_events_filter.o
55obj-$(CONFIG_EVENT_TRACING) += trace_events_trigger.o 56obj-$(CONFIG_EVENT_TRACING) += trace_events_trigger.o
57obj-$(CONFIG_HIST_TRIGGERS) += trace_events_hist.o
56obj-$(CONFIG_BPF_EVENTS) += bpf_trace.o 58obj-$(CONFIG_BPF_EVENTS) += bpf_trace.o
57obj-$(CONFIG_KPROBE_EVENT) += trace_kprobe.o 59obj-$(CONFIG_KPROBE_EVENT) += trace_kprobe.o
58obj-$(CONFIG_TRACEPOINTS) += power-traces.o 60obj-$(CONFIG_TRACEPOINTS) += power-traces.o
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index a2f0b9f33e9b..8a4bd6b68a0b 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -253,6 +253,9 @@ unsigned long long ns2usecs(cycle_t nsec)
253#define TOP_LEVEL_TRACE_FLAGS (TRACE_ITER_PRINTK | \ 253#define TOP_LEVEL_TRACE_FLAGS (TRACE_ITER_PRINTK | \
254 TRACE_ITER_PRINTK_MSGONLY | TRACE_ITER_RECORD_CMD) 254 TRACE_ITER_PRINTK_MSGONLY | TRACE_ITER_RECORD_CMD)
255 255
256/* trace_flags that are default zero for instances */
257#define ZEROED_TRACE_FLAGS \
258 TRACE_ITER_EVENT_FORK
256 259
257/* 260/*
258 * The global_trace is the descriptor that holds the tracing 261 * The global_trace is the descriptor that holds the tracing
@@ -303,33 +306,18 @@ void trace_array_put(struct trace_array *this_tr)
303 mutex_unlock(&trace_types_lock); 306 mutex_unlock(&trace_types_lock);
304} 307}
305 308
306int filter_check_discard(struct trace_event_file *file, void *rec,
307 struct ring_buffer *buffer,
308 struct ring_buffer_event *event)
309{
310 if (unlikely(file->flags & EVENT_FILE_FL_FILTERED) &&
311 !filter_match_preds(file->filter, rec)) {
312 ring_buffer_discard_commit(buffer, event);
313 return 1;
314 }
315
316 return 0;
317}
318EXPORT_SYMBOL_GPL(filter_check_discard);
319
320int call_filter_check_discard(struct trace_event_call *call, void *rec, 309int call_filter_check_discard(struct trace_event_call *call, void *rec,
321 struct ring_buffer *buffer, 310 struct ring_buffer *buffer,
322 struct ring_buffer_event *event) 311 struct ring_buffer_event *event)
323{ 312{
324 if (unlikely(call->flags & TRACE_EVENT_FL_FILTERED) && 313 if (unlikely(call->flags & TRACE_EVENT_FL_FILTERED) &&
325 !filter_match_preds(call->filter, rec)) { 314 !filter_match_preds(call->filter, rec)) {
326 ring_buffer_discard_commit(buffer, event); 315 __trace_event_discard_commit(buffer, event);
327 return 1; 316 return 1;
328 } 317 }
329 318
330 return 0; 319 return 0;
331} 320}
332EXPORT_SYMBOL_GPL(call_filter_check_discard);
333 321
334static cycle_t buffer_ftrace_now(struct trace_buffer *buf, int cpu) 322static cycle_t buffer_ftrace_now(struct trace_buffer *buf, int cpu)
335{ 323{
@@ -1672,6 +1660,16 @@ tracing_generic_entry_update(struct trace_entry *entry, unsigned long flags,
1672} 1660}
1673EXPORT_SYMBOL_GPL(tracing_generic_entry_update); 1661EXPORT_SYMBOL_GPL(tracing_generic_entry_update);
1674 1662
1663static __always_inline void
1664trace_event_setup(struct ring_buffer_event *event,
1665 int type, unsigned long flags, int pc)
1666{
1667 struct trace_entry *ent = ring_buffer_event_data(event);
1668
1669 tracing_generic_entry_update(ent, flags, pc);
1670 ent->type = type;
1671}
1672
1675struct ring_buffer_event * 1673struct ring_buffer_event *
1676trace_buffer_lock_reserve(struct ring_buffer *buffer, 1674trace_buffer_lock_reserve(struct ring_buffer *buffer,
1677 int type, 1675 int type,
@@ -1681,34 +1679,137 @@ trace_buffer_lock_reserve(struct ring_buffer *buffer,
1681 struct ring_buffer_event *event; 1679 struct ring_buffer_event *event;
1682 1680
1683 event = ring_buffer_lock_reserve(buffer, len); 1681 event = ring_buffer_lock_reserve(buffer, len);
1684 if (event != NULL) { 1682 if (event != NULL)
1685 struct trace_entry *ent = ring_buffer_event_data(event); 1683 trace_event_setup(event, type, flags, pc);
1684
1685 return event;
1686}
1687
1688DEFINE_PER_CPU(struct ring_buffer_event *, trace_buffered_event);
1689DEFINE_PER_CPU(int, trace_buffered_event_cnt);
1690static int trace_buffered_event_ref;
1691
1692/**
1693 * trace_buffered_event_enable - enable buffering events
1694 *
1695 * When events are being filtered, it is quicker to use a temporary
1696 * buffer to write the event data into if there's a likely chance
1697 * that it will not be committed. The discard of the ring buffer
1698 * is not as fast as committing, and is much slower than copying
1699 * a commit.
1700 *
1701 * When an event is to be filtered, allocate per cpu buffers to
1702 * write the event data into, and if the event is filtered and discarded
1703 * it is simply dropped, otherwise, the entire data is to be committed
1704 * in one shot.
1705 */
1706void trace_buffered_event_enable(void)
1707{
1708 struct ring_buffer_event *event;
1709 struct page *page;
1710 int cpu;
1686 1711
1687 tracing_generic_entry_update(ent, flags, pc); 1712 WARN_ON_ONCE(!mutex_is_locked(&event_mutex));
1688 ent->type = type; 1713
1714 if (trace_buffered_event_ref++)
1715 return;
1716
1717 for_each_tracing_cpu(cpu) {
1718 page = alloc_pages_node(cpu_to_node(cpu),
1719 GFP_KERNEL | __GFP_NORETRY, 0);
1720 if (!page)
1721 goto failed;
1722
1723 event = page_address(page);
1724 memset(event, 0, sizeof(*event));
1725
1726 per_cpu(trace_buffered_event, cpu) = event;
1727
1728 preempt_disable();
1729 if (cpu == smp_processor_id() &&
1730 this_cpu_read(trace_buffered_event) !=
1731 per_cpu(trace_buffered_event, cpu))
1732 WARN_ON_ONCE(1);
1733 preempt_enable();
1689 } 1734 }
1690 1735
1691 return event; 1736 return;
1737 failed:
1738 trace_buffered_event_disable();
1692} 1739}
1693 1740
1694void 1741static void enable_trace_buffered_event(void *data)
1695__buffer_unlock_commit(struct ring_buffer *buffer, struct ring_buffer_event *event)
1696{ 1742{
1697 __this_cpu_write(trace_cmdline_save, true); 1743 /* Probably not needed, but do it anyway */
1698 ring_buffer_unlock_commit(buffer, event); 1744 smp_rmb();
1745 this_cpu_dec(trace_buffered_event_cnt);
1699} 1746}
1700 1747
1701void trace_buffer_unlock_commit(struct trace_array *tr, 1748static void disable_trace_buffered_event(void *data)
1702 struct ring_buffer *buffer,
1703 struct ring_buffer_event *event,
1704 unsigned long flags, int pc)
1705{ 1749{
1706 __buffer_unlock_commit(buffer, event); 1750 this_cpu_inc(trace_buffered_event_cnt);
1751}
1707 1752
1708 ftrace_trace_stack(tr, buffer, flags, 6, pc, NULL); 1753/**
1709 ftrace_trace_userstack(buffer, flags, pc); 1754 * trace_buffered_event_disable - disable buffering events
1755 *
1756 * When a filter is removed, it is faster to not use the buffered
1757 * events, and to commit directly into the ring buffer. Free up
1758 * the temp buffers when there are no more users. This requires
1759 * special synchronization with current events.
1760 */
1761void trace_buffered_event_disable(void)
1762{
1763 int cpu;
1764
1765 WARN_ON_ONCE(!mutex_is_locked(&event_mutex));
1766
1767 if (WARN_ON_ONCE(!trace_buffered_event_ref))
1768 return;
1769
1770 if (--trace_buffered_event_ref)
1771 return;
1772
1773 preempt_disable();
1774 /* For each CPU, set the buffer as used. */
1775 smp_call_function_many(tracing_buffer_mask,
1776 disable_trace_buffered_event, NULL, 1);
1777 preempt_enable();
1778
1779 /* Wait for all current users to finish */
1780 synchronize_sched();
1781
1782 for_each_tracing_cpu(cpu) {
1783 free_page((unsigned long)per_cpu(trace_buffered_event, cpu));
1784 per_cpu(trace_buffered_event, cpu) = NULL;
1785 }
1786 /*
1787 * Make sure trace_buffered_event is NULL before clearing
1788 * trace_buffered_event_cnt.
1789 */
1790 smp_wmb();
1791
1792 preempt_disable();
1793 /* Do the work on each cpu */
1794 smp_call_function_many(tracing_buffer_mask,
1795 enable_trace_buffered_event, NULL, 1);
1796 preempt_enable();
1797}
1798
1799void
1800__buffer_unlock_commit(struct ring_buffer *buffer, struct ring_buffer_event *event)
1801{
1802 __this_cpu_write(trace_cmdline_save, true);
1803
1804 /* If this is the temp buffer, we need to commit fully */
1805 if (this_cpu_read(trace_buffered_event) == event) {
1806 /* Length is in event->array[0] */
1807 ring_buffer_write(buffer, event->array[0], &event->array[1]);
1808 /* Release the temp buffer */
1809 this_cpu_dec(trace_buffered_event_cnt);
1810 } else
1811 ring_buffer_unlock_commit(buffer, event);
1710} 1812}
1711EXPORT_SYMBOL_GPL(trace_buffer_unlock_commit);
1712 1813
1713static struct ring_buffer *temp_buffer; 1814static struct ring_buffer *temp_buffer;
1714 1815
@@ -1719,8 +1820,23 @@ trace_event_buffer_lock_reserve(struct ring_buffer **current_rb,
1719 unsigned long flags, int pc) 1820 unsigned long flags, int pc)
1720{ 1821{
1721 struct ring_buffer_event *entry; 1822 struct ring_buffer_event *entry;
1823 int val;
1722 1824
1723 *current_rb = trace_file->tr->trace_buffer.buffer; 1825 *current_rb = trace_file->tr->trace_buffer.buffer;
1826
1827 if ((trace_file->flags &
1828 (EVENT_FILE_FL_SOFT_DISABLED | EVENT_FILE_FL_FILTERED)) &&
1829 (entry = this_cpu_read(trace_buffered_event))) {
1830 /* Try to use the per cpu buffer first */
1831 val = this_cpu_inc_return(trace_buffered_event_cnt);
1832 if (val == 1) {
1833 trace_event_setup(entry, type, flags, pc);
1834 entry->array[0] = len;
1835 return entry;
1836 }
1837 this_cpu_dec(trace_buffered_event_cnt);
1838 }
1839
1724 entry = trace_buffer_lock_reserve(*current_rb, 1840 entry = trace_buffer_lock_reserve(*current_rb,
1725 type, len, flags, pc); 1841 type, len, flags, pc);
1726 /* 1842 /*
@@ -1738,17 +1854,6 @@ trace_event_buffer_lock_reserve(struct ring_buffer **current_rb,
1738} 1854}
1739EXPORT_SYMBOL_GPL(trace_event_buffer_lock_reserve); 1855EXPORT_SYMBOL_GPL(trace_event_buffer_lock_reserve);
1740 1856
1741struct ring_buffer_event *
1742trace_current_buffer_lock_reserve(struct ring_buffer **current_rb,
1743 int type, unsigned long len,
1744 unsigned long flags, int pc)
1745{
1746 *current_rb = global_trace.trace_buffer.buffer;
1747 return trace_buffer_lock_reserve(*current_rb,
1748 type, len, flags, pc);
1749}
1750EXPORT_SYMBOL_GPL(trace_current_buffer_lock_reserve);
1751
1752void trace_buffer_unlock_commit_regs(struct trace_array *tr, 1857void trace_buffer_unlock_commit_regs(struct trace_array *tr,
1753 struct ring_buffer *buffer, 1858 struct ring_buffer *buffer,
1754 struct ring_buffer_event *event, 1859 struct ring_buffer_event *event,
@@ -1760,14 +1865,6 @@ void trace_buffer_unlock_commit_regs(struct trace_array *tr,
1760 ftrace_trace_stack(tr, buffer, flags, 0, pc, regs); 1865 ftrace_trace_stack(tr, buffer, flags, 0, pc, regs);
1761 ftrace_trace_userstack(buffer, flags, pc); 1866 ftrace_trace_userstack(buffer, flags, pc);
1762} 1867}
1763EXPORT_SYMBOL_GPL(trace_buffer_unlock_commit_regs);
1764
1765void trace_current_buffer_discard_commit(struct ring_buffer *buffer,
1766 struct ring_buffer_event *event)
1767{
1768 ring_buffer_discard_commit(buffer, event);
1769}
1770EXPORT_SYMBOL_GPL(trace_current_buffer_discard_commit);
1771 1868
1772void 1869void
1773trace_function(struct trace_array *tr, 1870trace_function(struct trace_array *tr,
@@ -3571,6 +3668,9 @@ int set_tracer_flag(struct trace_array *tr, unsigned int mask, int enabled)
3571 if (mask == TRACE_ITER_RECORD_CMD) 3668 if (mask == TRACE_ITER_RECORD_CMD)
3572 trace_event_enable_cmd_record(enabled); 3669 trace_event_enable_cmd_record(enabled);
3573 3670
3671 if (mask == TRACE_ITER_EVENT_FORK)
3672 trace_event_follow_fork(tr, enabled);
3673
3574 if (mask == TRACE_ITER_OVERWRITE) { 3674 if (mask == TRACE_ITER_OVERWRITE) {
3575 ring_buffer_change_overwrite(tr->trace_buffer.buffer, enabled); 3675 ring_buffer_change_overwrite(tr->trace_buffer.buffer, enabled);
3576#ifdef CONFIG_TRACER_MAX_TRACE 3676#ifdef CONFIG_TRACER_MAX_TRACE
@@ -3658,7 +3758,7 @@ tracing_trace_options_write(struct file *filp, const char __user *ubuf,
3658 if (cnt >= sizeof(buf)) 3758 if (cnt >= sizeof(buf))
3659 return -EINVAL; 3759 return -EINVAL;
3660 3760
3661 if (copy_from_user(&buf, ubuf, cnt)) 3761 if (copy_from_user(buf, ubuf, cnt))
3662 return -EFAULT; 3762 return -EFAULT;
3663 3763
3664 buf[cnt] = 0; 3764 buf[cnt] = 0;
@@ -3804,12 +3904,19 @@ static const char readme_msg[] =
3804 "\t trigger: traceon, traceoff\n" 3904 "\t trigger: traceon, traceoff\n"
3805 "\t enable_event:<system>:<event>\n" 3905 "\t enable_event:<system>:<event>\n"
3806 "\t disable_event:<system>:<event>\n" 3906 "\t disable_event:<system>:<event>\n"
3907#ifdef CONFIG_HIST_TRIGGERS
3908 "\t enable_hist:<system>:<event>\n"
3909 "\t disable_hist:<system>:<event>\n"
3910#endif
3807#ifdef CONFIG_STACKTRACE 3911#ifdef CONFIG_STACKTRACE
3808 "\t\t stacktrace\n" 3912 "\t\t stacktrace\n"
3809#endif 3913#endif
3810#ifdef CONFIG_TRACER_SNAPSHOT 3914#ifdef CONFIG_TRACER_SNAPSHOT
3811 "\t\t snapshot\n" 3915 "\t\t snapshot\n"
3812#endif 3916#endif
3917#ifdef CONFIG_HIST_TRIGGERS
3918 "\t\t hist (see below)\n"
3919#endif
3813 "\t example: echo traceoff > events/block/block_unplug/trigger\n" 3920 "\t example: echo traceoff > events/block/block_unplug/trigger\n"
3814 "\t echo traceoff:3 > events/block/block_unplug/trigger\n" 3921 "\t echo traceoff:3 > events/block/block_unplug/trigger\n"
3815 "\t echo 'enable_event:kmem:kmalloc:3 if nr_rq > 1' > \\\n" 3922 "\t echo 'enable_event:kmem:kmalloc:3 if nr_rq > 1' > \\\n"
@@ -3825,6 +3932,56 @@ static const char readme_msg[] =
3825 "\t To remove a trigger with a count:\n" 3932 "\t To remove a trigger with a count:\n"
3826 "\t echo '!<trigger>:0 > <system>/<event>/trigger\n" 3933 "\t echo '!<trigger>:0 > <system>/<event>/trigger\n"
3827 "\t Filters can be ignored when removing a trigger.\n" 3934 "\t Filters can be ignored when removing a trigger.\n"
3935#ifdef CONFIG_HIST_TRIGGERS
3936 " hist trigger\t- If set, event hits are aggregated into a hash table\n"
3937 "\t Format: hist:keys=<field1[,field2,...]>\n"
3938 "\t [:values=<field1[,field2,...]>]\n"
3939 "\t [:sort=<field1[,field2,...]>]\n"
3940 "\t [:size=#entries]\n"
3941 "\t [:pause][:continue][:clear]\n"
3942 "\t [:name=histname1]\n"
3943 "\t [if <filter>]\n\n"
3944 "\t When a matching event is hit, an entry is added to a hash\n"
3945 "\t table using the key(s) and value(s) named, and the value of a\n"
3946 "\t sum called 'hitcount' is incremented. Keys and values\n"
3947 "\t correspond to fields in the event's format description. Keys\n"
3948 "\t can be any field, or the special string 'stacktrace'.\n"
3949 "\t Compound keys consisting of up to two fields can be specified\n"
3950 "\t by the 'keys' keyword. Values must correspond to numeric\n"
3951 "\t fields. Sort keys consisting of up to two fields can be\n"
3952 "\t specified using the 'sort' keyword. The sort direction can\n"
3953 "\t be modified by appending '.descending' or '.ascending' to a\n"
3954 "\t sort field. The 'size' parameter can be used to specify more\n"
3955 "\t or fewer than the default 2048 entries for the hashtable size.\n"
3956 "\t If a hist trigger is given a name using the 'name' parameter,\n"
3957 "\t its histogram data will be shared with other triggers of the\n"
3958 "\t same name, and trigger hits will update this common data.\n\n"
3959 "\t Reading the 'hist' file for the event will dump the hash\n"
3960 "\t table in its entirety to stdout. If there are multiple hist\n"
3961 "\t triggers attached to an event, there will be a table for each\n"
3962 "\t trigger in the output. The table displayed for a named\n"
3963 "\t trigger will be the same as any other instance having the\n"
3964 "\t same name. The default format used to display a given field\n"
3965 "\t can be modified by appending any of the following modifiers\n"
3966 "\t to the field name, as applicable:\n\n"
3967 "\t .hex display a number as a hex value\n"
3968 "\t .sym display an address as a symbol\n"
3969 "\t .sym-offset display an address as a symbol and offset\n"
3970 "\t .execname display a common_pid as a program name\n"
3971 "\t .syscall display a syscall id as a syscall name\n\n"
3972 "\t .log2 display log2 value rather than raw number\n\n"
3973 "\t The 'pause' parameter can be used to pause an existing hist\n"
3974 "\t trigger or to start a hist trigger but not log any events\n"
3975 "\t until told to do so. 'continue' can be used to start or\n"
3976 "\t restart a paused hist trigger.\n\n"
3977 "\t The 'clear' parameter will clear the contents of a running\n"
3978 "\t hist trigger and leave its current paused/active state\n"
3979 "\t unchanged.\n\n"
3980 "\t The enable_hist and disable_hist triggers can be used to\n"
3981 "\t have one event conditionally start and stop another event's\n"
3982 "\t already-attached hist trigger. The syntax is analagous to\n"
3983 "\t the enable_event and disable_event triggers.\n"
3984#endif
3828; 3985;
3829 3986
3830static ssize_t 3987static ssize_t
@@ -4474,7 +4631,7 @@ tracing_set_trace_write(struct file *filp, const char __user *ubuf,
4474 if (cnt > MAX_TRACER_SIZE) 4631 if (cnt > MAX_TRACER_SIZE)
4475 cnt = MAX_TRACER_SIZE; 4632 cnt = MAX_TRACER_SIZE;
4476 4633
4477 if (copy_from_user(&buf, ubuf, cnt)) 4634 if (copy_from_user(buf, ubuf, cnt))
4478 return -EFAULT; 4635 return -EFAULT;
4479 4636
4480 buf[cnt] = 0; 4637 buf[cnt] = 0;
@@ -5264,7 +5421,7 @@ static ssize_t tracing_clock_write(struct file *filp, const char __user *ubuf,
5264 if (cnt >= sizeof(buf)) 5421 if (cnt >= sizeof(buf))
5265 return -EINVAL; 5422 return -EINVAL;
5266 5423
5267 if (copy_from_user(&buf, ubuf, cnt)) 5424 if (copy_from_user(buf, ubuf, cnt))
5268 return -EFAULT; 5425 return -EFAULT;
5269 5426
5270 buf[cnt] = 0; 5427 buf[cnt] = 0;
@@ -6650,7 +6807,7 @@ static int instance_mkdir(const char *name)
6650 if (!alloc_cpumask_var(&tr->tracing_cpumask, GFP_KERNEL)) 6807 if (!alloc_cpumask_var(&tr->tracing_cpumask, GFP_KERNEL))
6651 goto out_free_tr; 6808 goto out_free_tr;
6652 6809
6653 tr->trace_flags = global_trace.trace_flags; 6810 tr->trace_flags = global_trace.trace_flags & ~ZEROED_TRACE_FLAGS;
6654 6811
6655 cpumask_copy(tr->tracing_cpumask, cpu_all_mask); 6812 cpumask_copy(tr->tracing_cpumask, cpu_all_mask);
6656 6813
@@ -6724,6 +6881,12 @@ static int instance_rmdir(const char *name)
6724 6881
6725 list_del(&tr->list); 6882 list_del(&tr->list);
6726 6883
6884 /* Disable all the flags that were enabled coming in */
6885 for (i = 0; i < TRACE_FLAGS_MAX_SIZE; i++) {
6886 if ((1 << i) & ZEROED_TRACE_FLAGS)
6887 set_tracer_flag(tr, 1 << i, 0);
6888 }
6889
6727 tracing_set_nop(tr); 6890 tracing_set_nop(tr);
6728 event_trace_del_tracer(tr); 6891 event_trace_del_tracer(tr);
6729 ftrace_destroy_function_files(tr); 6892 ftrace_destroy_function_files(tr);
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 3fff4adfd431..5167c366d6b7 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -177,9 +177,8 @@ struct trace_options {
177}; 177};
178 178
179struct trace_pid_list { 179struct trace_pid_list {
180 unsigned int nr_pids; 180 int pid_max;
181 int order; 181 unsigned long *pids;
182 pid_t *pids;
183}; 182};
184 183
185/* 184/*
@@ -656,6 +655,7 @@ static inline void __trace_stack(struct trace_array *tr, unsigned long flags,
656extern cycle_t ftrace_now(int cpu); 655extern cycle_t ftrace_now(int cpu);
657 656
658extern void trace_find_cmdline(int pid, char comm[]); 657extern void trace_find_cmdline(int pid, char comm[]);
658extern void trace_event_follow_fork(struct trace_array *tr, bool enable);
659 659
660#ifdef CONFIG_DYNAMIC_FTRACE 660#ifdef CONFIG_DYNAMIC_FTRACE
661extern unsigned long ftrace_update_tot_cnt; 661extern unsigned long ftrace_update_tot_cnt;
@@ -967,6 +967,7 @@ extern int trace_get_user(struct trace_parser *parser, const char __user *ubuf,
967 C(STOP_ON_FREE, "disable_on_free"), \ 967 C(STOP_ON_FREE, "disable_on_free"), \
968 C(IRQ_INFO, "irq-info"), \ 968 C(IRQ_INFO, "irq-info"), \
969 C(MARKERS, "markers"), \ 969 C(MARKERS, "markers"), \
970 C(EVENT_FORK, "event-fork"), \
970 FUNCTION_FLAGS \ 971 FUNCTION_FLAGS \
971 FGRAPH_FLAGS \ 972 FGRAPH_FLAGS \
972 STACK_FLAGS \ 973 STACK_FLAGS \
@@ -1064,6 +1065,137 @@ struct trace_subsystem_dir {
1064 int nr_events; 1065 int nr_events;
1065}; 1066};
1066 1067
1068extern int call_filter_check_discard(struct trace_event_call *call, void *rec,
1069 struct ring_buffer *buffer,
1070 struct ring_buffer_event *event);
1071
1072void trace_buffer_unlock_commit_regs(struct trace_array *tr,
1073 struct ring_buffer *buffer,
1074 struct ring_buffer_event *event,
1075 unsigned long flags, int pc,
1076 struct pt_regs *regs);
1077
1078static inline void trace_buffer_unlock_commit(struct trace_array *tr,
1079 struct ring_buffer *buffer,
1080 struct ring_buffer_event *event,
1081 unsigned long flags, int pc)
1082{
1083 trace_buffer_unlock_commit_regs(tr, buffer, event, flags, pc, NULL);
1084}
1085
1086DECLARE_PER_CPU(struct ring_buffer_event *, trace_buffered_event);
1087DECLARE_PER_CPU(int, trace_buffered_event_cnt);
1088void trace_buffered_event_disable(void);
1089void trace_buffered_event_enable(void);
1090
1091static inline void
1092__trace_event_discard_commit(struct ring_buffer *buffer,
1093 struct ring_buffer_event *event)
1094{
1095 if (this_cpu_read(trace_buffered_event) == event) {
1096 /* Simply release the temp buffer */
1097 this_cpu_dec(trace_buffered_event_cnt);
1098 return;
1099 }
1100 ring_buffer_discard_commit(buffer, event);
1101}
1102
1103/*
1104 * Helper function for event_trigger_unlock_commit{_regs}().
1105 * If there are event triggers attached to this event that requires
1106 * filtering against its fields, then they wil be called as the
1107 * entry already holds the field information of the current event.
1108 *
1109 * It also checks if the event should be discarded or not.
1110 * It is to be discarded if the event is soft disabled and the
1111 * event was only recorded to process triggers, or if the event
1112 * filter is active and this event did not match the filters.
1113 *
1114 * Returns true if the event is discarded, false otherwise.
1115 */
1116static inline bool
1117__event_trigger_test_discard(struct trace_event_file *file,
1118 struct ring_buffer *buffer,
1119 struct ring_buffer_event *event,
1120 void *entry,
1121 enum event_trigger_type *tt)
1122{
1123 unsigned long eflags = file->flags;
1124
1125 if (eflags & EVENT_FILE_FL_TRIGGER_COND)
1126 *tt = event_triggers_call(file, entry);
1127
1128 if (test_bit(EVENT_FILE_FL_SOFT_DISABLED_BIT, &file->flags) ||
1129 (unlikely(file->flags & EVENT_FILE_FL_FILTERED) &&
1130 !filter_match_preds(file->filter, entry))) {
1131 __trace_event_discard_commit(buffer, event);
1132 return true;
1133 }
1134
1135 return false;
1136}
1137
1138/**
1139 * event_trigger_unlock_commit - handle triggers and finish event commit
1140 * @file: The file pointer assoctiated to the event
1141 * @buffer: The ring buffer that the event is being written to
1142 * @event: The event meta data in the ring buffer
1143 * @entry: The event itself
1144 * @irq_flags: The state of the interrupts at the start of the event
1145 * @pc: The state of the preempt count at the start of the event.
1146 *
1147 * This is a helper function to handle triggers that require data
1148 * from the event itself. It also tests the event against filters and
1149 * if the event is soft disabled and should be discarded.
1150 */
1151static inline void
1152event_trigger_unlock_commit(struct trace_event_file *file,
1153 struct ring_buffer *buffer,
1154 struct ring_buffer_event *event,
1155 void *entry, unsigned long irq_flags, int pc)
1156{
1157 enum event_trigger_type tt = ETT_NONE;
1158
1159 if (!__event_trigger_test_discard(file, buffer, event, entry, &tt))
1160 trace_buffer_unlock_commit(file->tr, buffer, event, irq_flags, pc);
1161
1162 if (tt)
1163 event_triggers_post_call(file, tt, entry);
1164}
1165
1166/**
1167 * event_trigger_unlock_commit_regs - handle triggers and finish event commit
1168 * @file: The file pointer assoctiated to the event
1169 * @buffer: The ring buffer that the event is being written to
1170 * @event: The event meta data in the ring buffer
1171 * @entry: The event itself
1172 * @irq_flags: The state of the interrupts at the start of the event
1173 * @pc: The state of the preempt count at the start of the event.
1174 *
1175 * This is a helper function to handle triggers that require data
1176 * from the event itself. It also tests the event against filters and
1177 * if the event is soft disabled and should be discarded.
1178 *
1179 * Same as event_trigger_unlock_commit() but calls
1180 * trace_buffer_unlock_commit_regs() instead of trace_buffer_unlock_commit().
1181 */
1182static inline void
1183event_trigger_unlock_commit_regs(struct trace_event_file *file,
1184 struct ring_buffer *buffer,
1185 struct ring_buffer_event *event,
1186 void *entry, unsigned long irq_flags, int pc,
1187 struct pt_regs *regs)
1188{
1189 enum event_trigger_type tt = ETT_NONE;
1190
1191 if (!__event_trigger_test_discard(file, buffer, event, entry, &tt))
1192 trace_buffer_unlock_commit_regs(file->tr, buffer, event,
1193 irq_flags, pc, regs);
1194
1195 if (tt)
1196 event_triggers_post_call(file, tt, entry);
1197}
1198
1067#define FILTER_PRED_INVALID ((unsigned short)-1) 1199#define FILTER_PRED_INVALID ((unsigned short)-1)
1068#define FILTER_PRED_IS_RIGHT (1 << 15) 1200#define FILTER_PRED_IS_RIGHT (1 << 15)
1069#define FILTER_PRED_FOLD (1 << 15) 1201#define FILTER_PRED_FOLD (1 << 15)
@@ -1161,6 +1293,15 @@ extern struct mutex event_mutex;
1161extern struct list_head ftrace_events; 1293extern struct list_head ftrace_events;
1162 1294
1163extern const struct file_operations event_trigger_fops; 1295extern const struct file_operations event_trigger_fops;
1296extern const struct file_operations event_hist_fops;
1297
1298#ifdef CONFIG_HIST_TRIGGERS
1299extern int register_trigger_hist_cmd(void);
1300extern int register_trigger_hist_enable_disable_cmds(void);
1301#else
1302static inline int register_trigger_hist_cmd(void) { return 0; }
1303static inline int register_trigger_hist_enable_disable_cmds(void) { return 0; }
1304#endif
1164 1305
1165extern int register_trigger_cmds(void); 1306extern int register_trigger_cmds(void);
1166extern void clear_event_triggers(struct trace_array *tr); 1307extern void clear_event_triggers(struct trace_array *tr);
@@ -1174,9 +1315,41 @@ struct event_trigger_data {
1174 char *filter_str; 1315 char *filter_str;
1175 void *private_data; 1316 void *private_data;
1176 bool paused; 1317 bool paused;
1318 bool paused_tmp;
1177 struct list_head list; 1319 struct list_head list;
1320 char *name;
1321 struct list_head named_list;
1322 struct event_trigger_data *named_data;
1323};
1324
1325/* Avoid typos */
1326#define ENABLE_EVENT_STR "enable_event"
1327#define DISABLE_EVENT_STR "disable_event"
1328#define ENABLE_HIST_STR "enable_hist"
1329#define DISABLE_HIST_STR "disable_hist"
1330
1331struct enable_trigger_data {
1332 struct trace_event_file *file;
1333 bool enable;
1334 bool hist;
1178}; 1335};
1179 1336
1337extern int event_enable_trigger_print(struct seq_file *m,
1338 struct event_trigger_ops *ops,
1339 struct event_trigger_data *data);
1340extern void event_enable_trigger_free(struct event_trigger_ops *ops,
1341 struct event_trigger_data *data);
1342extern int event_enable_trigger_func(struct event_command *cmd_ops,
1343 struct trace_event_file *file,
1344 char *glob, char *cmd, char *param);
1345extern int event_enable_register_trigger(char *glob,
1346 struct event_trigger_ops *ops,
1347 struct event_trigger_data *data,
1348 struct trace_event_file *file);
1349extern void event_enable_unregister_trigger(char *glob,
1350 struct event_trigger_ops *ops,
1351 struct event_trigger_data *test,
1352 struct trace_event_file *file);
1180extern void trigger_data_free(struct event_trigger_data *data); 1353extern void trigger_data_free(struct event_trigger_data *data);
1181extern int event_trigger_init(struct event_trigger_ops *ops, 1354extern int event_trigger_init(struct event_trigger_ops *ops,
1182 struct event_trigger_data *data); 1355 struct event_trigger_data *data);
@@ -1189,7 +1362,18 @@ extern void unregister_trigger(char *glob, struct event_trigger_ops *ops,
1189extern int set_trigger_filter(char *filter_str, 1362extern int set_trigger_filter(char *filter_str,
1190 struct event_trigger_data *trigger_data, 1363 struct event_trigger_data *trigger_data,
1191 struct trace_event_file *file); 1364 struct trace_event_file *file);
1365extern struct event_trigger_data *find_named_trigger(const char *name);
1366extern bool is_named_trigger(struct event_trigger_data *test);
1367extern int save_named_trigger(const char *name,
1368 struct event_trigger_data *data);
1369extern void del_named_trigger(struct event_trigger_data *data);
1370extern void pause_named_trigger(struct event_trigger_data *data);
1371extern void unpause_named_trigger(struct event_trigger_data *data);
1372extern void set_named_trigger_data(struct event_trigger_data *data,
1373 struct event_trigger_data *named_data);
1192extern int register_event_command(struct event_command *cmd); 1374extern int register_event_command(struct event_command *cmd);
1375extern int unregister_event_command(struct event_command *cmd);
1376extern int register_trigger_hist_enable_disable_cmds(void);
1193 1377
1194/** 1378/**
1195 * struct event_trigger_ops - callbacks for trace event triggers 1379 * struct event_trigger_ops - callbacks for trace event triggers
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index b7b0760ba6ee..3d4155892a1e 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -15,7 +15,7 @@
15#include <linux/kthread.h> 15#include <linux/kthread.h>
16#include <linux/tracefs.h> 16#include <linux/tracefs.h>
17#include <linux/uaccess.h> 17#include <linux/uaccess.h>
18#include <linux/bsearch.h> 18#include <linux/vmalloc.h>
19#include <linux/module.h> 19#include <linux/module.h>
20#include <linux/ctype.h> 20#include <linux/ctype.h>
21#include <linux/sort.h> 21#include <linux/sort.h>
@@ -381,6 +381,7 @@ static int __ftrace_event_enable_disable(struct trace_event_file *file,
381{ 381{
382 struct trace_event_call *call = file->event_call; 382 struct trace_event_call *call = file->event_call;
383 struct trace_array *tr = file->tr; 383 struct trace_array *tr = file->tr;
384 unsigned long file_flags = file->flags;
384 int ret = 0; 385 int ret = 0;
385 int disable; 386 int disable;
386 387
@@ -463,6 +464,15 @@ static int __ftrace_event_enable_disable(struct trace_event_file *file,
463 break; 464 break;
464 } 465 }
465 466
467 /* Enable or disable use of trace_buffered_event */
468 if ((file_flags & EVENT_FILE_FL_SOFT_DISABLED) !=
469 (file->flags & EVENT_FILE_FL_SOFT_DISABLED)) {
470 if (file->flags & EVENT_FILE_FL_SOFT_DISABLED)
471 trace_buffered_event_enable();
472 else
473 trace_buffered_event_disable();
474 }
475
466 return ret; 476 return ret;
467} 477}
468 478
@@ -489,24 +499,26 @@ static void ftrace_clear_events(struct trace_array *tr)
489 mutex_unlock(&event_mutex); 499 mutex_unlock(&event_mutex);
490} 500}
491 501
492static int cmp_pid(const void *key, const void *elt) 502/* Shouldn't this be in a header? */
503extern int pid_max;
504
505/* Returns true if found in filter */
506static bool
507find_filtered_pid(struct trace_pid_list *filtered_pids, pid_t search_pid)
493{ 508{
494 const pid_t *search_pid = key; 509 /*
495 const pid_t *pid = elt; 510 * If pid_max changed after filtered_pids was created, we
511 * by default ignore all pids greater than the previous pid_max.
512 */
513 if (search_pid >= filtered_pids->pid_max)
514 return false;
496 515
497 if (*search_pid == *pid) 516 return test_bit(search_pid, filtered_pids->pids);
498 return 0;
499 if (*search_pid < *pid)
500 return -1;
501 return 1;
502} 517}
503 518
504static bool 519static bool
505check_ignore_pid(struct trace_pid_list *filtered_pids, struct task_struct *task) 520ignore_this_task(struct trace_pid_list *filtered_pids, struct task_struct *task)
506{ 521{
507 pid_t search_pid;
508 pid_t *pid;
509
510 /* 522 /*
511 * Return false, because if filtered_pids does not exist, 523 * Return false, because if filtered_pids does not exist,
512 * all pids are good to trace. 524 * all pids are good to trace.
@@ -514,15 +526,68 @@ check_ignore_pid(struct trace_pid_list *filtered_pids, struct task_struct *task)
514 if (!filtered_pids) 526 if (!filtered_pids)
515 return false; 527 return false;
516 528
517 search_pid = task->pid; 529 return !find_filtered_pid(filtered_pids, task->pid);
530}
531
532static void filter_add_remove_task(struct trace_pid_list *pid_list,
533 struct task_struct *self,
534 struct task_struct *task)
535{
536 if (!pid_list)
537 return;
538
539 /* For forks, we only add if the forking task is listed */
540 if (self) {
541 if (!find_filtered_pid(pid_list, self->pid))
542 return;
543 }
518 544
519 pid = bsearch(&search_pid, filtered_pids->pids, 545 /* Sorry, but we don't support pid_max changing after setting */
520 filtered_pids->nr_pids, sizeof(pid_t), 546 if (task->pid >= pid_list->pid_max)
521 cmp_pid); 547 return;
522 if (!pid)
523 return true;
524 548
525 return false; 549 /* "self" is set for forks, and NULL for exits */
550 if (self)
551 set_bit(task->pid, pid_list->pids);
552 else
553 clear_bit(task->pid, pid_list->pids);
554}
555
556static void
557event_filter_pid_sched_process_exit(void *data, struct task_struct *task)
558{
559 struct trace_pid_list *pid_list;
560 struct trace_array *tr = data;
561
562 pid_list = rcu_dereference_sched(tr->filtered_pids);
563 filter_add_remove_task(pid_list, NULL, task);
564}
565
566static void
567event_filter_pid_sched_process_fork(void *data,
568 struct task_struct *self,
569 struct task_struct *task)
570{
571 struct trace_pid_list *pid_list;
572 struct trace_array *tr = data;
573
574 pid_list = rcu_dereference_sched(tr->filtered_pids);
575 filter_add_remove_task(pid_list, self, task);
576}
577
578void trace_event_follow_fork(struct trace_array *tr, bool enable)
579{
580 if (enable) {
581 register_trace_prio_sched_process_fork(event_filter_pid_sched_process_fork,
582 tr, INT_MIN);
583 register_trace_prio_sched_process_exit(event_filter_pid_sched_process_exit,
584 tr, INT_MAX);
585 } else {
586 unregister_trace_sched_process_fork(event_filter_pid_sched_process_fork,
587 tr);
588 unregister_trace_sched_process_exit(event_filter_pid_sched_process_exit,
589 tr);
590 }
526} 591}
527 592
528static void 593static void
@@ -535,8 +600,8 @@ event_filter_pid_sched_switch_probe_pre(void *data, bool preempt,
535 pid_list = rcu_dereference_sched(tr->filtered_pids); 600 pid_list = rcu_dereference_sched(tr->filtered_pids);
536 601
537 this_cpu_write(tr->trace_buffer.data->ignore_pid, 602 this_cpu_write(tr->trace_buffer.data->ignore_pid,
538 check_ignore_pid(pid_list, prev) && 603 ignore_this_task(pid_list, prev) &&
539 check_ignore_pid(pid_list, next)); 604 ignore_this_task(pid_list, next));
540} 605}
541 606
542static void 607static void
@@ -549,7 +614,7 @@ event_filter_pid_sched_switch_probe_post(void *data, bool preempt,
549 pid_list = rcu_dereference_sched(tr->filtered_pids); 614 pid_list = rcu_dereference_sched(tr->filtered_pids);
550 615
551 this_cpu_write(tr->trace_buffer.data->ignore_pid, 616 this_cpu_write(tr->trace_buffer.data->ignore_pid,
552 check_ignore_pid(pid_list, next)); 617 ignore_this_task(pid_list, next));
553} 618}
554 619
555static void 620static void
@@ -565,7 +630,7 @@ event_filter_pid_sched_wakeup_probe_pre(void *data, struct task_struct *task)
565 pid_list = rcu_dereference_sched(tr->filtered_pids); 630 pid_list = rcu_dereference_sched(tr->filtered_pids);
566 631
567 this_cpu_write(tr->trace_buffer.data->ignore_pid, 632 this_cpu_write(tr->trace_buffer.data->ignore_pid,
568 check_ignore_pid(pid_list, task)); 633 ignore_this_task(pid_list, task));
569} 634}
570 635
571static void 636static void
@@ -582,7 +647,7 @@ event_filter_pid_sched_wakeup_probe_post(void *data, struct task_struct *task)
582 647
583 /* Set tracing if current is enabled */ 648 /* Set tracing if current is enabled */
584 this_cpu_write(tr->trace_buffer.data->ignore_pid, 649 this_cpu_write(tr->trace_buffer.data->ignore_pid,
585 check_ignore_pid(pid_list, current)); 650 ignore_this_task(pid_list, current));
586} 651}
587 652
588static void __ftrace_clear_event_pids(struct trace_array *tr) 653static void __ftrace_clear_event_pids(struct trace_array *tr)
@@ -620,7 +685,7 @@ static void __ftrace_clear_event_pids(struct trace_array *tr)
620 /* Wait till all users are no longer using pid filtering */ 685 /* Wait till all users are no longer using pid filtering */
621 synchronize_sched(); 686 synchronize_sched();
622 687
623 free_pages((unsigned long)pid_list->pids, pid_list->order); 688 vfree(pid_list->pids);
624 kfree(pid_list); 689 kfree(pid_list);
625} 690}
626 691
@@ -964,11 +1029,32 @@ static void t_stop(struct seq_file *m, void *p)
964 mutex_unlock(&event_mutex); 1029 mutex_unlock(&event_mutex);
965} 1030}
966 1031
1032static void *
1033p_next(struct seq_file *m, void *v, loff_t *pos)
1034{
1035 struct trace_array *tr = m->private;
1036 struct trace_pid_list *pid_list = rcu_dereference_sched(tr->filtered_pids);
1037 unsigned long pid = (unsigned long)v;
1038
1039 (*pos)++;
1040
1041 /* pid already is +1 of the actual prevous bit */
1042 pid = find_next_bit(pid_list->pids, pid_list->pid_max, pid);
1043
1044 /* Return pid + 1 to allow zero to be represented */
1045 if (pid < pid_list->pid_max)
1046 return (void *)(pid + 1);
1047
1048 return NULL;
1049}
1050
967static void *p_start(struct seq_file *m, loff_t *pos) 1051static void *p_start(struct seq_file *m, loff_t *pos)
968 __acquires(RCU) 1052 __acquires(RCU)
969{ 1053{
970 struct trace_pid_list *pid_list; 1054 struct trace_pid_list *pid_list;
971 struct trace_array *tr = m->private; 1055 struct trace_array *tr = m->private;
1056 unsigned long pid;
1057 loff_t l = 0;
972 1058
973 /* 1059 /*
974 * Grab the mutex, to keep calls to p_next() having the same 1060 * Grab the mutex, to keep calls to p_next() having the same
@@ -981,10 +1067,18 @@ static void *p_start(struct seq_file *m, loff_t *pos)
981 1067
982 pid_list = rcu_dereference_sched(tr->filtered_pids); 1068 pid_list = rcu_dereference_sched(tr->filtered_pids);
983 1069
984 if (!pid_list || *pos >= pid_list->nr_pids) 1070 if (!pid_list)
1071 return NULL;
1072
1073 pid = find_first_bit(pid_list->pids, pid_list->pid_max);
1074 if (pid >= pid_list->pid_max)
985 return NULL; 1075 return NULL;
986 1076
987 return (void *)&pid_list->pids[*pos]; 1077 /* Return pid + 1 so that zero can be the exit value */
1078 for (pid++; pid && l < *pos;
1079 pid = (unsigned long)p_next(m, (void *)pid, &l))
1080 ;
1081 return (void *)pid;
988} 1082}
989 1083
990static void p_stop(struct seq_file *m, void *p) 1084static void p_stop(struct seq_file *m, void *p)
@@ -994,25 +1088,11 @@ static void p_stop(struct seq_file *m, void *p)
994 mutex_unlock(&event_mutex); 1088 mutex_unlock(&event_mutex);
995} 1089}
996 1090
997static void *
998p_next(struct seq_file *m, void *v, loff_t *pos)
999{
1000 struct trace_array *tr = m->private;
1001 struct trace_pid_list *pid_list = rcu_dereference_sched(tr->filtered_pids);
1002
1003 (*pos)++;
1004
1005 if (*pos >= pid_list->nr_pids)
1006 return NULL;
1007
1008 return (void *)&pid_list->pids[*pos];
1009}
1010
1011static int p_show(struct seq_file *m, void *v) 1091static int p_show(struct seq_file *m, void *v)
1012{ 1092{
1013 pid_t *pid = v; 1093 unsigned long pid = (unsigned long)v - 1;
1014 1094
1015 seq_printf(m, "%d\n", *pid); 1095 seq_printf(m, "%lu\n", pid);
1016 return 0; 1096 return 0;
1017} 1097}
1018 1098
@@ -1561,11 +1641,6 @@ show_header(struct file *filp, char __user *ubuf, size_t cnt, loff_t *ppos)
1561 return r; 1641 return r;
1562} 1642}
1563 1643
1564static int max_pids(struct trace_pid_list *pid_list)
1565{
1566 return (PAGE_SIZE << pid_list->order) / sizeof(pid_t);
1567}
1568
1569static void ignore_task_cpu(void *data) 1644static void ignore_task_cpu(void *data)
1570{ 1645{
1571 struct trace_array *tr = data; 1646 struct trace_array *tr = data;
@@ -1579,7 +1654,7 @@ static void ignore_task_cpu(void *data)
1579 mutex_is_locked(&event_mutex)); 1654 mutex_is_locked(&event_mutex));
1580 1655
1581 this_cpu_write(tr->trace_buffer.data->ignore_pid, 1656 this_cpu_write(tr->trace_buffer.data->ignore_pid,
1582 check_ignore_pid(pid_list, current)); 1657 ignore_this_task(pid_list, current));
1583} 1658}
1584 1659
1585static ssize_t 1660static ssize_t
@@ -1589,7 +1664,7 @@ ftrace_event_pid_write(struct file *filp, const char __user *ubuf,
1589 struct seq_file *m = filp->private_data; 1664 struct seq_file *m = filp->private_data;
1590 struct trace_array *tr = m->private; 1665 struct trace_array *tr = m->private;
1591 struct trace_pid_list *filtered_pids = NULL; 1666 struct trace_pid_list *filtered_pids = NULL;
1592 struct trace_pid_list *pid_list = NULL; 1667 struct trace_pid_list *pid_list;
1593 struct trace_event_file *file; 1668 struct trace_event_file *file;
1594 struct trace_parser parser; 1669 struct trace_parser parser;
1595 unsigned long val; 1670 unsigned long val;
@@ -1597,7 +1672,7 @@ ftrace_event_pid_write(struct file *filp, const char __user *ubuf,
1597 ssize_t read = 0; 1672 ssize_t read = 0;
1598 ssize_t ret = 0; 1673 ssize_t ret = 0;
1599 pid_t pid; 1674 pid_t pid;
1600 int i; 1675 int nr_pids = 0;
1601 1676
1602 if (!cnt) 1677 if (!cnt)
1603 return 0; 1678 return 0;
@@ -1610,10 +1685,43 @@ ftrace_event_pid_write(struct file *filp, const char __user *ubuf,
1610 return -ENOMEM; 1685 return -ENOMEM;
1611 1686
1612 mutex_lock(&event_mutex); 1687 mutex_lock(&event_mutex);
1688 filtered_pids = rcu_dereference_protected(tr->filtered_pids,
1689 lockdep_is_held(&event_mutex));
1690
1613 /* 1691 /*
1614 * Load as many pids into the array before doing a 1692 * Always recreate a new array. The write is an all or nothing
1615 * swap from the tr->filtered_pids to the new list. 1693 * operation. Always create a new array when adding new pids by
1694 * the user. If the operation fails, then the current list is
1695 * not modified.
1616 */ 1696 */
1697 pid_list = kmalloc(sizeof(*pid_list), GFP_KERNEL);
1698 if (!pid_list) {
1699 read = -ENOMEM;
1700 goto out;
1701 }
1702 pid_list->pid_max = READ_ONCE(pid_max);
1703 /* Only truncating will shrink pid_max */
1704 if (filtered_pids && filtered_pids->pid_max > pid_list->pid_max)
1705 pid_list->pid_max = filtered_pids->pid_max;
1706 pid_list->pids = vzalloc((pid_list->pid_max + 7) >> 3);
1707 if (!pid_list->pids) {
1708 kfree(pid_list);
1709 read = -ENOMEM;
1710 goto out;
1711 }
1712 if (filtered_pids) {
1713 /* copy the current bits to the new max */
1714 pid = find_first_bit(filtered_pids->pids,
1715 filtered_pids->pid_max);
1716 while (pid < filtered_pids->pid_max) {
1717 set_bit(pid, pid_list->pids);
1718 pid = find_next_bit(filtered_pids->pids,
1719 filtered_pids->pid_max,
1720 pid + 1);
1721 nr_pids++;
1722 }
1723 }
1724
1617 while (cnt > 0) { 1725 while (cnt > 0) {
1618 1726
1619 this_pos = 0; 1727 this_pos = 0;
@@ -1631,92 +1739,35 @@ ftrace_event_pid_write(struct file *filp, const char __user *ubuf,
1631 ret = -EINVAL; 1739 ret = -EINVAL;
1632 if (kstrtoul(parser.buffer, 0, &val)) 1740 if (kstrtoul(parser.buffer, 0, &val))
1633 break; 1741 break;
1634 if (val > INT_MAX) 1742 if (val >= pid_list->pid_max)
1635 break; 1743 break;
1636 1744
1637 pid = (pid_t)val; 1745 pid = (pid_t)val;
1638 1746
1639 ret = -ENOMEM; 1747 set_bit(pid, pid_list->pids);
1640 if (!pid_list) { 1748 nr_pids++;
1641 pid_list = kmalloc(sizeof(*pid_list), GFP_KERNEL);
1642 if (!pid_list)
1643 break;
1644 1749
1645 filtered_pids = rcu_dereference_protected(tr->filtered_pids,
1646 lockdep_is_held(&event_mutex));
1647 if (filtered_pids)
1648 pid_list->order = filtered_pids->order;
1649 else
1650 pid_list->order = 0;
1651
1652 pid_list->pids = (void *)__get_free_pages(GFP_KERNEL,
1653 pid_list->order);
1654 if (!pid_list->pids)
1655 break;
1656
1657 if (filtered_pids) {
1658 pid_list->nr_pids = filtered_pids->nr_pids;
1659 memcpy(pid_list->pids, filtered_pids->pids,
1660 pid_list->nr_pids * sizeof(pid_t));
1661 } else
1662 pid_list->nr_pids = 0;
1663 }
1664
1665 if (pid_list->nr_pids >= max_pids(pid_list)) {
1666 pid_t *pid_page;
1667
1668 pid_page = (void *)__get_free_pages(GFP_KERNEL,
1669 pid_list->order + 1);
1670 if (!pid_page)
1671 break;
1672 memcpy(pid_page, pid_list->pids,
1673 pid_list->nr_pids * sizeof(pid_t));
1674 free_pages((unsigned long)pid_list->pids, pid_list->order);
1675
1676 pid_list->order++;
1677 pid_list->pids = pid_page;
1678 }
1679
1680 pid_list->pids[pid_list->nr_pids++] = pid;
1681 trace_parser_clear(&parser); 1750 trace_parser_clear(&parser);
1682 ret = 0; 1751 ret = 0;
1683 } 1752 }
1684 trace_parser_put(&parser); 1753 trace_parser_put(&parser);
1685 1754
1686 if (ret < 0) { 1755 if (ret < 0) {
1687 if (pid_list) 1756 vfree(pid_list->pids);
1688 free_pages((unsigned long)pid_list->pids, pid_list->order);
1689 kfree(pid_list); 1757 kfree(pid_list);
1690 mutex_unlock(&event_mutex); 1758 read = ret;
1691 return ret; 1759 goto out;
1692 }
1693
1694 if (!pid_list) {
1695 mutex_unlock(&event_mutex);
1696 return ret;
1697 } 1760 }
1698 1761
1699 sort(pid_list->pids, pid_list->nr_pids, sizeof(pid_t), cmp_pid, NULL); 1762 if (!nr_pids) {
1700 1763 /* Cleared the list of pids */
1701 /* Remove duplicates */ 1764 vfree(pid_list->pids);
1702 for (i = 1; i < pid_list->nr_pids; i++) { 1765 kfree(pid_list);
1703 int start = i; 1766 read = ret;
1704 1767 if (!filtered_pids)
1705 while (i < pid_list->nr_pids && 1768 goto out;
1706 pid_list->pids[i - 1] == pid_list->pids[i]) 1769 pid_list = NULL;
1707 i++;
1708
1709 if (start != i) {
1710 if (i < pid_list->nr_pids) {
1711 memmove(&pid_list->pids[start], &pid_list->pids[i],
1712 (pid_list->nr_pids - i) * sizeof(pid_t));
1713 pid_list->nr_pids -= i - start;
1714 i = start;
1715 } else
1716 pid_list->nr_pids = start;
1717 }
1718 } 1770 }
1719
1720 rcu_assign_pointer(tr->filtered_pids, pid_list); 1771 rcu_assign_pointer(tr->filtered_pids, pid_list);
1721 1772
1722 list_for_each_entry(file, &tr->events, list) { 1773 list_for_each_entry(file, &tr->events, list) {
@@ -1726,7 +1777,7 @@ ftrace_event_pid_write(struct file *filp, const char __user *ubuf,
1726 if (filtered_pids) { 1777 if (filtered_pids) {
1727 synchronize_sched(); 1778 synchronize_sched();
1728 1779
1729 free_pages((unsigned long)filtered_pids->pids, filtered_pids->order); 1780 vfree(filtered_pids->pids);
1730 kfree(filtered_pids); 1781 kfree(filtered_pids);
1731 } else { 1782 } else {
1732 /* 1783 /*
@@ -1763,10 +1814,12 @@ ftrace_event_pid_write(struct file *filp, const char __user *ubuf,
1763 */ 1814 */
1764 on_each_cpu(ignore_task_cpu, tr, 1); 1815 on_each_cpu(ignore_task_cpu, tr, 1);
1765 1816
1817 out:
1766 mutex_unlock(&event_mutex); 1818 mutex_unlock(&event_mutex);
1767 1819
1768 ret = read; 1820 ret = read;
1769 *ppos += read; 1821 if (read > 0)
1822 *ppos += read;
1770 1823
1771 return ret; 1824 return ret;
1772} 1825}
@@ -2121,6 +2174,10 @@ event_create_dir(struct dentry *parent, struct trace_event_file *file)
2121 trace_create_file("trigger", 0644, file->dir, file, 2174 trace_create_file("trigger", 0644, file->dir, file,
2122 &event_trigger_fops); 2175 &event_trigger_fops);
2123 2176
2177#ifdef CONFIG_HIST_TRIGGERS
2178 trace_create_file("hist", 0444, file->dir, file,
2179 &event_hist_fops);
2180#endif
2124 trace_create_file("format", 0444, file->dir, call, 2181 trace_create_file("format", 0444, file->dir, call,
2125 &ftrace_event_format_fops); 2182 &ftrace_event_format_fops);
2126 2183
@@ -3368,7 +3425,7 @@ static __init void event_trace_self_tests(void)
3368 3425
3369static DEFINE_PER_CPU(atomic_t, ftrace_test_event_disable); 3426static DEFINE_PER_CPU(atomic_t, ftrace_test_event_disable);
3370 3427
3371static struct trace_array *event_tr; 3428static struct trace_event_file event_trace_file __initdata;
3372 3429
3373static void __init 3430static void __init
3374function_test_events_call(unsigned long ip, unsigned long parent_ip, 3431function_test_events_call(unsigned long ip, unsigned long parent_ip,
@@ -3392,17 +3449,17 @@ function_test_events_call(unsigned long ip, unsigned long parent_ip,
3392 3449
3393 local_save_flags(flags); 3450 local_save_flags(flags);
3394 3451
3395 event = trace_current_buffer_lock_reserve(&buffer, 3452 event = trace_event_buffer_lock_reserve(&buffer, &event_trace_file,
3396 TRACE_FN, sizeof(*entry), 3453 TRACE_FN, sizeof(*entry),
3397 flags, pc); 3454 flags, pc);
3398 if (!event) 3455 if (!event)
3399 goto out; 3456 goto out;
3400 entry = ring_buffer_event_data(event); 3457 entry = ring_buffer_event_data(event);
3401 entry->ip = ip; 3458 entry->ip = ip;
3402 entry->parent_ip = parent_ip; 3459 entry->parent_ip = parent_ip;
3403 3460
3404 trace_buffer_unlock_commit(event_tr, buffer, event, flags, pc); 3461 event_trigger_unlock_commit(&event_trace_file, buffer, event,
3405 3462 entry, flags, pc);
3406 out: 3463 out:
3407 atomic_dec(&per_cpu(ftrace_test_event_disable, cpu)); 3464 atomic_dec(&per_cpu(ftrace_test_event_disable, cpu));
3408 preempt_enable_notrace(); 3465 preempt_enable_notrace();
@@ -3417,9 +3474,11 @@ static struct ftrace_ops trace_ops __initdata =
3417static __init void event_trace_self_test_with_function(void) 3474static __init void event_trace_self_test_with_function(void)
3418{ 3475{
3419 int ret; 3476 int ret;
3420 event_tr = top_trace_array(); 3477
3421 if (WARN_ON(!event_tr)) 3478 event_trace_file.tr = top_trace_array();
3479 if (WARN_ON(!event_trace_file.tr))
3422 return; 3480 return;
3481
3423 ret = register_ftrace_function(&trace_ops); 3482 ret = register_ftrace_function(&trace_ops);
3424 if (WARN_ON(ret < 0)) { 3483 if (WARN_ON(ret < 0)) {
3425 pr_info("Failed to enable function tracer for event tests\n"); 3484 pr_info("Failed to enable function tracer for event tests\n");
diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
index b3f5051cd4e9..9daa9b3bc6d9 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -689,10 +689,7 @@ static void append_filter_err(struct filter_parse_state *ps,
689 689
690static inline struct event_filter *event_filter(struct trace_event_file *file) 690static inline struct event_filter *event_filter(struct trace_event_file *file)
691{ 691{
692 if (file->event_call->flags & TRACE_EVENT_FL_USE_CALL_FILTER) 692 return file->filter;
693 return file->event_call->filter;
694 else
695 return file->filter;
696} 693}
697 694
698/* caller must hold event_mutex */ 695/* caller must hold event_mutex */
@@ -826,12 +823,12 @@ static void __free_preds(struct event_filter *filter)
826 823
827static void filter_disable(struct trace_event_file *file) 824static void filter_disable(struct trace_event_file *file)
828{ 825{
829 struct trace_event_call *call = file->event_call; 826 unsigned long old_flags = file->flags;
830 827
831 if (call->flags & TRACE_EVENT_FL_USE_CALL_FILTER) 828 file->flags &= ~EVENT_FILE_FL_FILTERED;
832 call->flags &= ~TRACE_EVENT_FL_FILTERED; 829
833 else 830 if (old_flags != file->flags)
834 file->flags &= ~EVENT_FILE_FL_FILTERED; 831 trace_buffered_event_disable();
835} 832}
836 833
837static void __free_filter(struct event_filter *filter) 834static void __free_filter(struct event_filter *filter)
@@ -883,13 +880,8 @@ static int __alloc_preds(struct event_filter *filter, int n_preds)
883 880
884static inline void __remove_filter(struct trace_event_file *file) 881static inline void __remove_filter(struct trace_event_file *file)
885{ 882{
886 struct trace_event_call *call = file->event_call;
887
888 filter_disable(file); 883 filter_disable(file);
889 if (call->flags & TRACE_EVENT_FL_USE_CALL_FILTER) 884 remove_filter_string(file->filter);
890 remove_filter_string(call->filter);
891 else
892 remove_filter_string(file->filter);
893} 885}
894 886
895static void filter_free_subsystem_preds(struct trace_subsystem_dir *dir, 887static void filter_free_subsystem_preds(struct trace_subsystem_dir *dir,
@@ -906,15 +898,8 @@ static void filter_free_subsystem_preds(struct trace_subsystem_dir *dir,
906 898
907static inline void __free_subsystem_filter(struct trace_event_file *file) 899static inline void __free_subsystem_filter(struct trace_event_file *file)
908{ 900{
909 struct trace_event_call *call = file->event_call; 901 __free_filter(file->filter);
910 902 file->filter = NULL;
911 if (call->flags & TRACE_EVENT_FL_USE_CALL_FILTER) {
912 __free_filter(call->filter);
913 call->filter = NULL;
914 } else {
915 __free_filter(file->filter);
916 file->filter = NULL;
917 }
918} 903}
919 904
920static void filter_free_subsystem_filters(struct trace_subsystem_dir *dir, 905static void filter_free_subsystem_filters(struct trace_subsystem_dir *dir,
@@ -1718,69 +1703,43 @@ fail:
1718 1703
1719static inline void event_set_filtered_flag(struct trace_event_file *file) 1704static inline void event_set_filtered_flag(struct trace_event_file *file)
1720{ 1705{
1721 struct trace_event_call *call = file->event_call; 1706 unsigned long old_flags = file->flags;
1722 1707
1723 if (call->flags & TRACE_EVENT_FL_USE_CALL_FILTER) 1708 file->flags |= EVENT_FILE_FL_FILTERED;
1724 call->flags |= TRACE_EVENT_FL_FILTERED; 1709
1725 else 1710 if (old_flags != file->flags)
1726 file->flags |= EVENT_FILE_FL_FILTERED; 1711 trace_buffered_event_enable();
1727} 1712}
1728 1713
1729static inline void event_set_filter(struct trace_event_file *file, 1714static inline void event_set_filter(struct trace_event_file *file,
1730 struct event_filter *filter) 1715 struct event_filter *filter)
1731{ 1716{
1732 struct trace_event_call *call = file->event_call; 1717 rcu_assign_pointer(file->filter, filter);
1733
1734 if (call->flags & TRACE_EVENT_FL_USE_CALL_FILTER)
1735 rcu_assign_pointer(call->filter, filter);
1736 else
1737 rcu_assign_pointer(file->filter, filter);
1738} 1718}
1739 1719
1740static inline void event_clear_filter(struct trace_event_file *file) 1720static inline void event_clear_filter(struct trace_event_file *file)
1741{ 1721{
1742 struct trace_event_call *call = file->event_call; 1722 RCU_INIT_POINTER(file->filter, NULL);
1743
1744 if (call->flags & TRACE_EVENT_FL_USE_CALL_FILTER)
1745 RCU_INIT_POINTER(call->filter, NULL);
1746 else
1747 RCU_INIT_POINTER(file->filter, NULL);
1748} 1723}
1749 1724
1750static inline void 1725static inline void
1751event_set_no_set_filter_flag(struct trace_event_file *file) 1726event_set_no_set_filter_flag(struct trace_event_file *file)
1752{ 1727{
1753 struct trace_event_call *call = file->event_call; 1728 file->flags |= EVENT_FILE_FL_NO_SET_FILTER;
1754
1755 if (call->flags & TRACE_EVENT_FL_USE_CALL_FILTER)
1756 call->flags |= TRACE_EVENT_FL_NO_SET_FILTER;
1757 else
1758 file->flags |= EVENT_FILE_FL_NO_SET_FILTER;
1759} 1729}
1760 1730
1761static inline void 1731static inline void
1762event_clear_no_set_filter_flag(struct trace_event_file *file) 1732event_clear_no_set_filter_flag(struct trace_event_file *file)
1763{ 1733{
1764 struct trace_event_call *call = file->event_call; 1734 file->flags &= ~EVENT_FILE_FL_NO_SET_FILTER;
1765
1766 if (call->flags & TRACE_EVENT_FL_USE_CALL_FILTER)
1767 call->flags &= ~TRACE_EVENT_FL_NO_SET_FILTER;
1768 else
1769 file->flags &= ~EVENT_FILE_FL_NO_SET_FILTER;
1770} 1735}
1771 1736
1772static inline bool 1737static inline bool
1773event_no_set_filter_flag(struct trace_event_file *file) 1738event_no_set_filter_flag(struct trace_event_file *file)
1774{ 1739{
1775 struct trace_event_call *call = file->event_call;
1776
1777 if (file->flags & EVENT_FILE_FL_NO_SET_FILTER) 1740 if (file->flags & EVENT_FILE_FL_NO_SET_FILTER)
1778 return true; 1741 return true;
1779 1742
1780 if ((call->flags & TRACE_EVENT_FL_USE_CALL_FILTER) &&
1781 (call->flags & TRACE_EVENT_FL_NO_SET_FILTER))
1782 return true;
1783
1784 return false; 1743 return false;
1785} 1744}
1786 1745
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
new file mode 100644
index 000000000000..0c05b8a99806
--- /dev/null
+++ b/kernel/trace/trace_events_hist.c
@@ -0,0 +1,1755 @@
1/*
2 * trace_events_hist - trace event hist triggers
3 *
4 * This program is free software; you can redistribute it and/or modify
5 * it under the terms of the GNU General Public License as published by
6 * the Free Software Foundation; either version 2 of the License, or
7 * (at your option) any later version.
8 *
9 * This program is distributed in the hope that it will be useful,
10 * but WITHOUT ANY WARRANTY; without even the implied warranty of
11 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
12 * GNU General Public License for more details.
13 *
14 * Copyright (C) 2015 Tom Zanussi <tom.zanussi@linux.intel.com>
15 */
16
17#include <linux/module.h>
18#include <linux/kallsyms.h>
19#include <linux/mutex.h>
20#include <linux/slab.h>
21#include <linux/stacktrace.h>
22
23#include "tracing_map.h"
24#include "trace.h"
25
26struct hist_field;
27
28typedef u64 (*hist_field_fn_t) (struct hist_field *field, void *event);
29
30struct hist_field {
31 struct ftrace_event_field *field;
32 unsigned long flags;
33 hist_field_fn_t fn;
34 unsigned int size;
35 unsigned int offset;
36};
37
38static u64 hist_field_none(struct hist_field *field, void *event)
39{
40 return 0;
41}
42
43static u64 hist_field_counter(struct hist_field *field, void *event)
44{
45 return 1;
46}
47
48static u64 hist_field_string(struct hist_field *hist_field, void *event)
49{
50 char *addr = (char *)(event + hist_field->field->offset);
51
52 return (u64)(unsigned long)addr;
53}
54
55static u64 hist_field_dynstring(struct hist_field *hist_field, void *event)
56{
57 u32 str_item = *(u32 *)(event + hist_field->field->offset);
58 int str_loc = str_item & 0xffff;
59 char *addr = (char *)(event + str_loc);
60
61 return (u64)(unsigned long)addr;
62}
63
64static u64 hist_field_pstring(struct hist_field *hist_field, void *event)
65{
66 char **addr = (char **)(event + hist_field->field->offset);
67
68 return (u64)(unsigned long)*addr;
69}
70
71static u64 hist_field_log2(struct hist_field *hist_field, void *event)
72{
73 u64 val = *(u64 *)(event + hist_field->field->offset);
74
75 return (u64) ilog2(roundup_pow_of_two(val));
76}
77
78#define DEFINE_HIST_FIELD_FN(type) \
79static u64 hist_field_##type(struct hist_field *hist_field, void *event)\
80{ \
81 type *addr = (type *)(event + hist_field->field->offset); \
82 \
83 return (u64)(unsigned long)*addr; \
84}
85
86DEFINE_HIST_FIELD_FN(s64);
87DEFINE_HIST_FIELD_FN(u64);
88DEFINE_HIST_FIELD_FN(s32);
89DEFINE_HIST_FIELD_FN(u32);
90DEFINE_HIST_FIELD_FN(s16);
91DEFINE_HIST_FIELD_FN(u16);
92DEFINE_HIST_FIELD_FN(s8);
93DEFINE_HIST_FIELD_FN(u8);
94
95#define for_each_hist_field(i, hist_data) \
96 for ((i) = 0; (i) < (hist_data)->n_fields; (i)++)
97
98#define for_each_hist_val_field(i, hist_data) \
99 for ((i) = 0; (i) < (hist_data)->n_vals; (i)++)
100
101#define for_each_hist_key_field(i, hist_data) \
102 for ((i) = (hist_data)->n_vals; (i) < (hist_data)->n_fields; (i)++)
103
104#define HIST_STACKTRACE_DEPTH 16
105#define HIST_STACKTRACE_SIZE (HIST_STACKTRACE_DEPTH * sizeof(unsigned long))
106#define HIST_STACKTRACE_SKIP 5
107
108#define HITCOUNT_IDX 0
109#define HIST_KEY_SIZE_MAX (MAX_FILTER_STR_VAL + HIST_STACKTRACE_SIZE)
110
111enum hist_field_flags {
112 HIST_FIELD_FL_HITCOUNT = 1,
113 HIST_FIELD_FL_KEY = 2,
114 HIST_FIELD_FL_STRING = 4,
115 HIST_FIELD_FL_HEX = 8,
116 HIST_FIELD_FL_SYM = 16,
117 HIST_FIELD_FL_SYM_OFFSET = 32,
118 HIST_FIELD_FL_EXECNAME = 64,
119 HIST_FIELD_FL_SYSCALL = 128,
120 HIST_FIELD_FL_STACKTRACE = 256,
121 HIST_FIELD_FL_LOG2 = 512,
122};
123
124struct hist_trigger_attrs {
125 char *keys_str;
126 char *vals_str;
127 char *sort_key_str;
128 char *name;
129 bool pause;
130 bool cont;
131 bool clear;
132 unsigned int map_bits;
133};
134
135struct hist_trigger_data {
136 struct hist_field *fields[TRACING_MAP_FIELDS_MAX];
137 unsigned int n_vals;
138 unsigned int n_keys;
139 unsigned int n_fields;
140 unsigned int key_size;
141 struct tracing_map_sort_key sort_keys[TRACING_MAP_SORT_KEYS_MAX];
142 unsigned int n_sort_keys;
143 struct trace_event_file *event_file;
144 struct hist_trigger_attrs *attrs;
145 struct tracing_map *map;
146};
147
148static hist_field_fn_t select_value_fn(int field_size, int field_is_signed)
149{
150 hist_field_fn_t fn = NULL;
151
152 switch (field_size) {
153 case 8:
154 if (field_is_signed)
155 fn = hist_field_s64;
156 else
157 fn = hist_field_u64;
158 break;
159 case 4:
160 if (field_is_signed)
161 fn = hist_field_s32;
162 else
163 fn = hist_field_u32;
164 break;
165 case 2:
166 if (field_is_signed)
167 fn = hist_field_s16;
168 else
169 fn = hist_field_u16;
170 break;
171 case 1:
172 if (field_is_signed)
173 fn = hist_field_s8;
174 else
175 fn = hist_field_u8;
176 break;
177 }
178
179 return fn;
180}
181
182static int parse_map_size(char *str)
183{
184 unsigned long size, map_bits;
185 int ret;
186
187 strsep(&str, "=");
188 if (!str) {
189 ret = -EINVAL;
190 goto out;
191 }
192
193 ret = kstrtoul(str, 0, &size);
194 if (ret)
195 goto out;
196
197 map_bits = ilog2(roundup_pow_of_two(size));
198 if (map_bits < TRACING_MAP_BITS_MIN ||
199 map_bits > TRACING_MAP_BITS_MAX)
200 ret = -EINVAL;
201 else
202 ret = map_bits;
203 out:
204 return ret;
205}
206
207static void destroy_hist_trigger_attrs(struct hist_trigger_attrs *attrs)
208{
209 if (!attrs)
210 return;
211
212 kfree(attrs->name);
213 kfree(attrs->sort_key_str);
214 kfree(attrs->keys_str);
215 kfree(attrs->vals_str);
216 kfree(attrs);
217}
218
219static struct hist_trigger_attrs *parse_hist_trigger_attrs(char *trigger_str)
220{
221 struct hist_trigger_attrs *attrs;
222 int ret = 0;
223
224 attrs = kzalloc(sizeof(*attrs), GFP_KERNEL);
225 if (!attrs)
226 return ERR_PTR(-ENOMEM);
227
228 while (trigger_str) {
229 char *str = strsep(&trigger_str, ":");
230
231 if ((strncmp(str, "key=", strlen("key=")) == 0) ||
232 (strncmp(str, "keys=", strlen("keys=")) == 0))
233 attrs->keys_str = kstrdup(str, GFP_KERNEL);
234 else if ((strncmp(str, "val=", strlen("val=")) == 0) ||
235 (strncmp(str, "vals=", strlen("vals=")) == 0) ||
236 (strncmp(str, "values=", strlen("values=")) == 0))
237 attrs->vals_str = kstrdup(str, GFP_KERNEL);
238 else if (strncmp(str, "sort=", strlen("sort=")) == 0)
239 attrs->sort_key_str = kstrdup(str, GFP_KERNEL);
240 else if (strncmp(str, "name=", strlen("name=")) == 0)
241 attrs->name = kstrdup(str, GFP_KERNEL);
242 else if (strcmp(str, "pause") == 0)
243 attrs->pause = true;
244 else if ((strcmp(str, "cont") == 0) ||
245 (strcmp(str, "continue") == 0))
246 attrs->cont = true;
247 else if (strcmp(str, "clear") == 0)
248 attrs->clear = true;
249 else if (strncmp(str, "size=", strlen("size=")) == 0) {
250 int map_bits = parse_map_size(str);
251
252 if (map_bits < 0) {
253 ret = map_bits;
254 goto free;
255 }
256 attrs->map_bits = map_bits;
257 } else {
258 ret = -EINVAL;
259 goto free;
260 }
261 }
262
263 if (!attrs->keys_str) {
264 ret = -EINVAL;
265 goto free;
266 }
267
268 return attrs;
269 free:
270 destroy_hist_trigger_attrs(attrs);
271
272 return ERR_PTR(ret);
273}
274
275static inline void save_comm(char *comm, struct task_struct *task)
276{
277 if (!task->pid) {
278 strcpy(comm, "<idle>");
279 return;
280 }
281
282 if (WARN_ON_ONCE(task->pid < 0)) {
283 strcpy(comm, "<XXX>");
284 return;
285 }
286
287 memcpy(comm, task->comm, TASK_COMM_LEN);
288}
289
290static void hist_trigger_elt_comm_free(struct tracing_map_elt *elt)
291{
292 kfree((char *)elt->private_data);
293}
294
295static int hist_trigger_elt_comm_alloc(struct tracing_map_elt *elt)
296{
297 struct hist_trigger_data *hist_data = elt->map->private_data;
298 struct hist_field *key_field;
299 unsigned int i;
300
301 for_each_hist_key_field(i, hist_data) {
302 key_field = hist_data->fields[i];
303
304 if (key_field->flags & HIST_FIELD_FL_EXECNAME) {
305 unsigned int size = TASK_COMM_LEN + 1;
306
307 elt->private_data = kzalloc(size, GFP_KERNEL);
308 if (!elt->private_data)
309 return -ENOMEM;
310 break;
311 }
312 }
313
314 return 0;
315}
316
317static void hist_trigger_elt_comm_copy(struct tracing_map_elt *to,
318 struct tracing_map_elt *from)
319{
320 char *comm_from = from->private_data;
321 char *comm_to = to->private_data;
322
323 if (comm_from)
324 memcpy(comm_to, comm_from, TASK_COMM_LEN + 1);
325}
326
327static void hist_trigger_elt_comm_init(struct tracing_map_elt *elt)
328{
329 char *comm = elt->private_data;
330
331 if (comm)
332 save_comm(comm, current);
333}
334
335static const struct tracing_map_ops hist_trigger_elt_comm_ops = {
336 .elt_alloc = hist_trigger_elt_comm_alloc,
337 .elt_copy = hist_trigger_elt_comm_copy,
338 .elt_free = hist_trigger_elt_comm_free,
339 .elt_init = hist_trigger_elt_comm_init,
340};
341
342static void destroy_hist_field(struct hist_field *hist_field)
343{
344 kfree(hist_field);
345}
346
347static struct hist_field *create_hist_field(struct ftrace_event_field *field,
348 unsigned long flags)
349{
350 struct hist_field *hist_field;
351
352 if (field && is_function_field(field))
353 return NULL;
354
355 hist_field = kzalloc(sizeof(struct hist_field), GFP_KERNEL);
356 if (!hist_field)
357 return NULL;
358
359 if (flags & HIST_FIELD_FL_HITCOUNT) {
360 hist_field->fn = hist_field_counter;
361 goto out;
362 }
363
364 if (flags & HIST_FIELD_FL_STACKTRACE) {
365 hist_field->fn = hist_field_none;
366 goto out;
367 }
368
369 if (flags & HIST_FIELD_FL_LOG2) {
370 hist_field->fn = hist_field_log2;
371 goto out;
372 }
373
374 if (WARN_ON_ONCE(!field))
375 goto out;
376
377 if (is_string_field(field)) {
378 flags |= HIST_FIELD_FL_STRING;
379
380 if (field->filter_type == FILTER_STATIC_STRING)
381 hist_field->fn = hist_field_string;
382 else if (field->filter_type == FILTER_DYN_STRING)
383 hist_field->fn = hist_field_dynstring;
384 else
385 hist_field->fn = hist_field_pstring;
386 } else {
387 hist_field->fn = select_value_fn(field->size,
388 field->is_signed);
389 if (!hist_field->fn) {
390 destroy_hist_field(hist_field);
391 return NULL;
392 }
393 }
394 out:
395 hist_field->field = field;
396 hist_field->flags = flags;
397
398 return hist_field;
399}
400
401static void destroy_hist_fields(struct hist_trigger_data *hist_data)
402{
403 unsigned int i;
404
405 for (i = 0; i < TRACING_MAP_FIELDS_MAX; i++) {
406 if (hist_data->fields[i]) {
407 destroy_hist_field(hist_data->fields[i]);
408 hist_data->fields[i] = NULL;
409 }
410 }
411}
412
413static int create_hitcount_val(struct hist_trigger_data *hist_data)
414{
415 hist_data->fields[HITCOUNT_IDX] =
416 create_hist_field(NULL, HIST_FIELD_FL_HITCOUNT);
417 if (!hist_data->fields[HITCOUNT_IDX])
418 return -ENOMEM;
419
420 hist_data->n_vals++;
421
422 if (WARN_ON(hist_data->n_vals > TRACING_MAP_VALS_MAX))
423 return -EINVAL;
424
425 return 0;
426}
427
428static int create_val_field(struct hist_trigger_data *hist_data,
429 unsigned int val_idx,
430 struct trace_event_file *file,
431 char *field_str)
432{
433 struct ftrace_event_field *field = NULL;
434 unsigned long flags = 0;
435 char *field_name;
436 int ret = 0;
437
438 if (WARN_ON(val_idx >= TRACING_MAP_VALS_MAX))
439 return -EINVAL;
440
441 field_name = strsep(&field_str, ".");
442 if (field_str) {
443 if (strcmp(field_str, "hex") == 0)
444 flags |= HIST_FIELD_FL_HEX;
445 else {
446 ret = -EINVAL;
447 goto out;
448 }
449 }
450
451 field = trace_find_event_field(file->event_call, field_name);
452 if (!field) {
453 ret = -EINVAL;
454 goto out;
455 }
456
457 hist_data->fields[val_idx] = create_hist_field(field, flags);
458 if (!hist_data->fields[val_idx]) {
459 ret = -ENOMEM;
460 goto out;
461 }
462
463 ++hist_data->n_vals;
464
465 if (WARN_ON(hist_data->n_vals > TRACING_MAP_VALS_MAX))
466 ret = -EINVAL;
467 out:
468 return ret;
469}
470
471static int create_val_fields(struct hist_trigger_data *hist_data,
472 struct trace_event_file *file)
473{
474 char *fields_str, *field_str;
475 unsigned int i, j;
476 int ret;
477
478 ret = create_hitcount_val(hist_data);
479 if (ret)
480 goto out;
481
482 fields_str = hist_data->attrs->vals_str;
483 if (!fields_str)
484 goto out;
485
486 strsep(&fields_str, "=");
487 if (!fields_str)
488 goto out;
489
490 for (i = 0, j = 1; i < TRACING_MAP_VALS_MAX &&
491 j < TRACING_MAP_VALS_MAX; i++) {
492 field_str = strsep(&fields_str, ",");
493 if (!field_str)
494 break;
495 if (strcmp(field_str, "hitcount") == 0)
496 continue;
497 ret = create_val_field(hist_data, j++, file, field_str);
498 if (ret)
499 goto out;
500 }
501 if (fields_str && (strcmp(fields_str, "hitcount") != 0))
502 ret = -EINVAL;
503 out:
504 return ret;
505}
506
507static int create_key_field(struct hist_trigger_data *hist_data,
508 unsigned int key_idx,
509 unsigned int key_offset,
510 struct trace_event_file *file,
511 char *field_str)
512{
513 struct ftrace_event_field *field = NULL;
514 unsigned long flags = 0;
515 unsigned int key_size;
516 int ret = 0;
517
518 if (WARN_ON(key_idx >= TRACING_MAP_FIELDS_MAX))
519 return -EINVAL;
520
521 flags |= HIST_FIELD_FL_KEY;
522
523 if (strcmp(field_str, "stacktrace") == 0) {
524 flags |= HIST_FIELD_FL_STACKTRACE;
525 key_size = sizeof(unsigned long) * HIST_STACKTRACE_DEPTH;
526 } else {
527 char *field_name = strsep(&field_str, ".");
528
529 if (field_str) {
530 if (strcmp(field_str, "hex") == 0)
531 flags |= HIST_FIELD_FL_HEX;
532 else if (strcmp(field_str, "sym") == 0)
533 flags |= HIST_FIELD_FL_SYM;
534 else if (strcmp(field_str, "sym-offset") == 0)
535 flags |= HIST_FIELD_FL_SYM_OFFSET;
536 else if ((strcmp(field_str, "execname") == 0) &&
537 (strcmp(field_name, "common_pid") == 0))
538 flags |= HIST_FIELD_FL_EXECNAME;
539 else if (strcmp(field_str, "syscall") == 0)
540 flags |= HIST_FIELD_FL_SYSCALL;
541 else if (strcmp(field_str, "log2") == 0)
542 flags |= HIST_FIELD_FL_LOG2;
543 else {
544 ret = -EINVAL;
545 goto out;
546 }
547 }
548
549 field = trace_find_event_field(file->event_call, field_name);
550 if (!field) {
551 ret = -EINVAL;
552 goto out;
553 }
554
555 if (is_string_field(field))
556 key_size = MAX_FILTER_STR_VAL;
557 else
558 key_size = field->size;
559 }
560
561 hist_data->fields[key_idx] = create_hist_field(field, flags);
562 if (!hist_data->fields[key_idx]) {
563 ret = -ENOMEM;
564 goto out;
565 }
566
567 key_size = ALIGN(key_size, sizeof(u64));
568 hist_data->fields[key_idx]->size = key_size;
569 hist_data->fields[key_idx]->offset = key_offset;
570 hist_data->key_size += key_size;
571 if (hist_data->key_size > HIST_KEY_SIZE_MAX) {
572 ret = -EINVAL;
573 goto out;
574 }
575
576 hist_data->n_keys++;
577
578 if (WARN_ON(hist_data->n_keys > TRACING_MAP_KEYS_MAX))
579 return -EINVAL;
580
581 ret = key_size;
582 out:
583 return ret;
584}
585
586static int create_key_fields(struct hist_trigger_data *hist_data,
587 struct trace_event_file *file)
588{
589 unsigned int i, key_offset = 0, n_vals = hist_data->n_vals;
590 char *fields_str, *field_str;
591 int ret = -EINVAL;
592
593 fields_str = hist_data->attrs->keys_str;
594 if (!fields_str)
595 goto out;
596
597 strsep(&fields_str, "=");
598 if (!fields_str)
599 goto out;
600
601 for (i = n_vals; i < n_vals + TRACING_MAP_KEYS_MAX; i++) {
602 field_str = strsep(&fields_str, ",");
603 if (!field_str)
604 break;
605 ret = create_key_field(hist_data, i, key_offset,
606 file, field_str);
607 if (ret < 0)
608 goto out;
609 key_offset += ret;
610 }
611 if (fields_str) {
612 ret = -EINVAL;
613 goto out;
614 }
615 ret = 0;
616 out:
617 return ret;
618}
619
620static int create_hist_fields(struct hist_trigger_data *hist_data,
621 struct trace_event_file *file)
622{
623 int ret;
624
625 ret = create_val_fields(hist_data, file);
626 if (ret)
627 goto out;
628
629 ret = create_key_fields(hist_data, file);
630 if (ret)
631 goto out;
632
633 hist_data->n_fields = hist_data->n_vals + hist_data->n_keys;
634 out:
635 return ret;
636}
637
638static int is_descending(const char *str)
639{
640 if (!str)
641 return 0;
642
643 if (strcmp(str, "descending") == 0)
644 return 1;
645
646 if (strcmp(str, "ascending") == 0)
647 return 0;
648
649 return -EINVAL;
650}
651
652static int create_sort_keys(struct hist_trigger_data *hist_data)
653{
654 char *fields_str = hist_data->attrs->sort_key_str;
655 struct ftrace_event_field *field = NULL;
656 struct tracing_map_sort_key *sort_key;
657 int descending, ret = 0;
658 unsigned int i, j;
659
660 hist_data->n_sort_keys = 1; /* we always have at least one, hitcount */
661
662 if (!fields_str)
663 goto out;
664
665 strsep(&fields_str, "=");
666 if (!fields_str) {
667 ret = -EINVAL;
668 goto out;
669 }
670
671 for (i = 0; i < TRACING_MAP_SORT_KEYS_MAX; i++) {
672 char *field_str, *field_name;
673
674 sort_key = &hist_data->sort_keys[i];
675
676 field_str = strsep(&fields_str, ",");
677 if (!field_str) {
678 if (i == 0)
679 ret = -EINVAL;
680 break;
681 }
682
683 if ((i == TRACING_MAP_SORT_KEYS_MAX - 1) && fields_str) {
684 ret = -EINVAL;
685 break;
686 }
687
688 field_name = strsep(&field_str, ".");
689 if (!field_name) {
690 ret = -EINVAL;
691 break;
692 }
693
694 if (strcmp(field_name, "hitcount") == 0) {
695 descending = is_descending(field_str);
696 if (descending < 0) {
697 ret = descending;
698 break;
699 }
700 sort_key->descending = descending;
701 continue;
702 }
703
704 for (j = 1; j < hist_data->n_fields; j++) {
705 field = hist_data->fields[j]->field;
706 if (field && (strcmp(field_name, field->name) == 0)) {
707 sort_key->field_idx = j;
708 descending = is_descending(field_str);
709 if (descending < 0) {
710 ret = descending;
711 goto out;
712 }
713 sort_key->descending = descending;
714 break;
715 }
716 }
717 if (j == hist_data->n_fields) {
718 ret = -EINVAL;
719 break;
720 }
721 }
722 hist_data->n_sort_keys = i;
723 out:
724 return ret;
725}
726
727static void destroy_hist_data(struct hist_trigger_data *hist_data)
728{
729 destroy_hist_trigger_attrs(hist_data->attrs);
730 destroy_hist_fields(hist_data);
731 tracing_map_destroy(hist_data->map);
732 kfree(hist_data);
733}
734
735static int create_tracing_map_fields(struct hist_trigger_data *hist_data)
736{
737 struct tracing_map *map = hist_data->map;
738 struct ftrace_event_field *field;
739 struct hist_field *hist_field;
740 int i, idx;
741
742 for_each_hist_field(i, hist_data) {
743 hist_field = hist_data->fields[i];
744 if (hist_field->flags & HIST_FIELD_FL_KEY) {
745 tracing_map_cmp_fn_t cmp_fn;
746
747 field = hist_field->field;
748
749 if (hist_field->flags & HIST_FIELD_FL_STACKTRACE)
750 cmp_fn = tracing_map_cmp_none;
751 else if (is_string_field(field))
752 cmp_fn = tracing_map_cmp_string;
753 else
754 cmp_fn = tracing_map_cmp_num(field->size,
755 field->is_signed);
756 idx = tracing_map_add_key_field(map,
757 hist_field->offset,
758 cmp_fn);
759
760 } else
761 idx = tracing_map_add_sum_field(map);
762
763 if (idx < 0)
764 return idx;
765 }
766
767 return 0;
768}
769
770static bool need_tracing_map_ops(struct hist_trigger_data *hist_data)
771{
772 struct hist_field *key_field;
773 unsigned int i;
774
775 for_each_hist_key_field(i, hist_data) {
776 key_field = hist_data->fields[i];
777
778 if (key_field->flags & HIST_FIELD_FL_EXECNAME)
779 return true;
780 }
781
782 return false;
783}
784
785static struct hist_trigger_data *
786create_hist_data(unsigned int map_bits,
787 struct hist_trigger_attrs *attrs,
788 struct trace_event_file *file)
789{
790 const struct tracing_map_ops *map_ops = NULL;
791 struct hist_trigger_data *hist_data;
792 int ret = 0;
793
794 hist_data = kzalloc(sizeof(*hist_data), GFP_KERNEL);
795 if (!hist_data)
796 return ERR_PTR(-ENOMEM);
797
798 hist_data->attrs = attrs;
799
800 ret = create_hist_fields(hist_data, file);
801 if (ret)
802 goto free;
803
804 ret = create_sort_keys(hist_data);
805 if (ret)
806 goto free;
807
808 if (need_tracing_map_ops(hist_data))
809 map_ops = &hist_trigger_elt_comm_ops;
810
811 hist_data->map = tracing_map_create(map_bits, hist_data->key_size,
812 map_ops, hist_data);
813 if (IS_ERR(hist_data->map)) {
814 ret = PTR_ERR(hist_data->map);
815 hist_data->map = NULL;
816 goto free;
817 }
818
819 ret = create_tracing_map_fields(hist_data);
820 if (ret)
821 goto free;
822
823 ret = tracing_map_init(hist_data->map);
824 if (ret)
825 goto free;
826
827 hist_data->event_file = file;
828 out:
829 return hist_data;
830 free:
831 hist_data->attrs = NULL;
832
833 destroy_hist_data(hist_data);
834
835 hist_data = ERR_PTR(ret);
836
837 goto out;
838}
839
840static void hist_trigger_elt_update(struct hist_trigger_data *hist_data,
841 struct tracing_map_elt *elt,
842 void *rec)
843{
844 struct hist_field *hist_field;
845 unsigned int i;
846 u64 hist_val;
847
848 for_each_hist_val_field(i, hist_data) {
849 hist_field = hist_data->fields[i];
850 hist_val = hist_field->fn(hist_field, rec);
851 tracing_map_update_sum(elt, i, hist_val);
852 }
853}
854
855static inline void add_to_key(char *compound_key, void *key,
856 struct hist_field *key_field, void *rec)
857{
858 size_t size = key_field->size;
859
860 if (key_field->flags & HIST_FIELD_FL_STRING) {
861 struct ftrace_event_field *field;
862
863 field = key_field->field;
864 if (field->filter_type == FILTER_DYN_STRING)
865 size = *(u32 *)(rec + field->offset) >> 16;
866 else if (field->filter_type == FILTER_PTR_STRING)
867 size = strlen(key);
868 else if (field->filter_type == FILTER_STATIC_STRING)
869 size = field->size;
870
871 /* ensure NULL-termination */
872 if (size > key_field->size - 1)
873 size = key_field->size - 1;
874 }
875
876 memcpy(compound_key + key_field->offset, key, size);
877}
878
879static void event_hist_trigger(struct event_trigger_data *data, void *rec)
880{
881 struct hist_trigger_data *hist_data = data->private_data;
882 bool use_compound_key = (hist_data->n_keys > 1);
883 unsigned long entries[HIST_STACKTRACE_DEPTH];
884 char compound_key[HIST_KEY_SIZE_MAX];
885 struct stack_trace stacktrace;
886 struct hist_field *key_field;
887 struct tracing_map_elt *elt;
888 u64 field_contents;
889 void *key = NULL;
890 unsigned int i;
891
892 memset(compound_key, 0, hist_data->key_size);
893
894 for_each_hist_key_field(i, hist_data) {
895 key_field = hist_data->fields[i];
896
897 if (key_field->flags & HIST_FIELD_FL_STACKTRACE) {
898 stacktrace.max_entries = HIST_STACKTRACE_DEPTH;
899 stacktrace.entries = entries;
900 stacktrace.nr_entries = 0;
901 stacktrace.skip = HIST_STACKTRACE_SKIP;
902
903 memset(stacktrace.entries, 0, HIST_STACKTRACE_SIZE);
904 save_stack_trace(&stacktrace);
905
906 key = entries;
907 } else {
908 field_contents = key_field->fn(key_field, rec);
909 if (key_field->flags & HIST_FIELD_FL_STRING) {
910 key = (void *)(unsigned long)field_contents;
911 use_compound_key = true;
912 } else
913 key = (void *)&field_contents;
914 }
915
916 if (use_compound_key)
917 add_to_key(compound_key, key, key_field, rec);
918 }
919
920 if (use_compound_key)
921 key = compound_key;
922
923 elt = tracing_map_insert(hist_data->map, key);
924 if (elt)
925 hist_trigger_elt_update(hist_data, elt, rec);
926}
927
928static void hist_trigger_stacktrace_print(struct seq_file *m,
929 unsigned long *stacktrace_entries,
930 unsigned int max_entries)
931{
932 char str[KSYM_SYMBOL_LEN];
933 unsigned int spaces = 8;
934 unsigned int i;
935
936 for (i = 0; i < max_entries; i++) {
937 if (stacktrace_entries[i] == ULONG_MAX)
938 return;
939
940 seq_printf(m, "%*c", 1 + spaces, ' ');
941 sprint_symbol(str, stacktrace_entries[i]);
942 seq_printf(m, "%s\n", str);
943 }
944}
945
946static void
947hist_trigger_entry_print(struct seq_file *m,
948 struct hist_trigger_data *hist_data, void *key,
949 struct tracing_map_elt *elt)
950{
951 struct hist_field *key_field;
952 char str[KSYM_SYMBOL_LEN];
953 bool multiline = false;
954 unsigned int i;
955 u64 uval;
956
957 seq_puts(m, "{ ");
958
959 for_each_hist_key_field(i, hist_data) {
960 key_field = hist_data->fields[i];
961
962 if (i > hist_data->n_vals)
963 seq_puts(m, ", ");
964
965 if (key_field->flags & HIST_FIELD_FL_HEX) {
966 uval = *(u64 *)(key + key_field->offset);
967 seq_printf(m, "%s: %llx",
968 key_field->field->name, uval);
969 } else if (key_field->flags & HIST_FIELD_FL_SYM) {
970 uval = *(u64 *)(key + key_field->offset);
971 sprint_symbol_no_offset(str, uval);
972 seq_printf(m, "%s: [%llx] %-45s",
973 key_field->field->name, uval, str);
974 } else if (key_field->flags & HIST_FIELD_FL_SYM_OFFSET) {
975 uval = *(u64 *)(key + key_field->offset);
976 sprint_symbol(str, uval);
977 seq_printf(m, "%s: [%llx] %-55s",
978 key_field->field->name, uval, str);
979 } else if (key_field->flags & HIST_FIELD_FL_EXECNAME) {
980 char *comm = elt->private_data;
981
982 uval = *(u64 *)(key + key_field->offset);
983 seq_printf(m, "%s: %-16s[%10llu]",
984 key_field->field->name, comm, uval);
985 } else if (key_field->flags & HIST_FIELD_FL_SYSCALL) {
986 const char *syscall_name;
987
988 uval = *(u64 *)(key + key_field->offset);
989 syscall_name = get_syscall_name(uval);
990 if (!syscall_name)
991 syscall_name = "unknown_syscall";
992
993 seq_printf(m, "%s: %-30s[%3llu]",
994 key_field->field->name, syscall_name, uval);
995 } else if (key_field->flags & HIST_FIELD_FL_STACKTRACE) {
996 seq_puts(m, "stacktrace:\n");
997 hist_trigger_stacktrace_print(m,
998 key + key_field->offset,
999 HIST_STACKTRACE_DEPTH);
1000 multiline = true;
1001 } else if (key_field->flags & HIST_FIELD_FL_LOG2) {
1002 seq_printf(m, "%s: ~ 2^%-2llu", key_field->field->name,
1003 *(u64 *)(key + key_field->offset));
1004 } else if (key_field->flags & HIST_FIELD_FL_STRING) {
1005 seq_printf(m, "%s: %-50s", key_field->field->name,
1006 (char *)(key + key_field->offset));
1007 } else {
1008 uval = *(u64 *)(key + key_field->offset);
1009 seq_printf(m, "%s: %10llu", key_field->field->name,
1010 uval);
1011 }
1012 }
1013
1014 if (!multiline)
1015 seq_puts(m, " ");
1016
1017 seq_puts(m, "}");
1018
1019 seq_printf(m, " hitcount: %10llu",
1020 tracing_map_read_sum(elt, HITCOUNT_IDX));
1021
1022 for (i = 1; i < hist_data->n_vals; i++) {
1023 if (hist_data->fields[i]->flags & HIST_FIELD_FL_HEX) {
1024 seq_printf(m, " %s: %10llx",
1025 hist_data->fields[i]->field->name,
1026 tracing_map_read_sum(elt, i));
1027 } else {
1028 seq_printf(m, " %s: %10llu",
1029 hist_data->fields[i]->field->name,
1030 tracing_map_read_sum(elt, i));
1031 }
1032 }
1033
1034 seq_puts(m, "\n");
1035}
1036
1037static int print_entries(struct seq_file *m,
1038 struct hist_trigger_data *hist_data)
1039{
1040 struct tracing_map_sort_entry **sort_entries = NULL;
1041 struct tracing_map *map = hist_data->map;
1042 int i, n_entries;
1043
1044 n_entries = tracing_map_sort_entries(map, hist_data->sort_keys,
1045 hist_data->n_sort_keys,
1046 &sort_entries);
1047 if (n_entries < 0)
1048 return n_entries;
1049
1050 for (i = 0; i < n_entries; i++)
1051 hist_trigger_entry_print(m, hist_data,
1052 sort_entries[i]->key,
1053 sort_entries[i]->elt);
1054
1055 tracing_map_destroy_sort_entries(sort_entries, n_entries);
1056
1057 return n_entries;
1058}
1059
1060static void hist_trigger_show(struct seq_file *m,
1061 struct event_trigger_data *data, int n)
1062{
1063 struct hist_trigger_data *hist_data;
1064 int n_entries, ret = 0;
1065
1066 if (n > 0)
1067 seq_puts(m, "\n\n");
1068
1069 seq_puts(m, "# event histogram\n#\n# trigger info: ");
1070 data->ops->print(m, data->ops, data);
1071 seq_puts(m, "#\n\n");
1072
1073 hist_data = data->private_data;
1074 n_entries = print_entries(m, hist_data);
1075 if (n_entries < 0) {
1076 ret = n_entries;
1077 n_entries = 0;
1078 }
1079
1080 seq_printf(m, "\nTotals:\n Hits: %llu\n Entries: %u\n Dropped: %llu\n",
1081 (u64)atomic64_read(&hist_data->map->hits),
1082 n_entries, (u64)atomic64_read(&hist_data->map->drops));
1083}
1084
1085static int hist_show(struct seq_file *m, void *v)
1086{
1087 struct event_trigger_data *data;
1088 struct trace_event_file *event_file;
1089 int n = 0, ret = 0;
1090
1091 mutex_lock(&event_mutex);
1092
1093 event_file = event_file_data(m->private);
1094 if (unlikely(!event_file)) {
1095 ret = -ENODEV;
1096 goto out_unlock;
1097 }
1098
1099 list_for_each_entry_rcu(data, &event_file->triggers, list) {
1100 if (data->cmd_ops->trigger_type == ETT_EVENT_HIST)
1101 hist_trigger_show(m, data, n++);
1102 }
1103
1104 out_unlock:
1105 mutex_unlock(&event_mutex);
1106
1107 return ret;
1108}
1109
1110static int event_hist_open(struct inode *inode, struct file *file)
1111{
1112 return single_open(file, hist_show, file);
1113}
1114
1115const struct file_operations event_hist_fops = {
1116 .open = event_hist_open,
1117 .read = seq_read,
1118 .llseek = seq_lseek,
1119 .release = single_release,
1120};
1121
1122static const char *get_hist_field_flags(struct hist_field *hist_field)
1123{
1124 const char *flags_str = NULL;
1125
1126 if (hist_field->flags & HIST_FIELD_FL_HEX)
1127 flags_str = "hex";
1128 else if (hist_field->flags & HIST_FIELD_FL_SYM)
1129 flags_str = "sym";
1130 else if (hist_field->flags & HIST_FIELD_FL_SYM_OFFSET)
1131 flags_str = "sym-offset";
1132 else if (hist_field->flags & HIST_FIELD_FL_EXECNAME)
1133 flags_str = "execname";
1134 else if (hist_field->flags & HIST_FIELD_FL_SYSCALL)
1135 flags_str = "syscall";
1136 else if (hist_field->flags & HIST_FIELD_FL_LOG2)
1137 flags_str = "log2";
1138
1139 return flags_str;
1140}
1141
1142static void hist_field_print(struct seq_file *m, struct hist_field *hist_field)
1143{
1144 seq_printf(m, "%s", hist_field->field->name);
1145 if (hist_field->flags) {
1146 const char *flags_str = get_hist_field_flags(hist_field);
1147
1148 if (flags_str)
1149 seq_printf(m, ".%s", flags_str);
1150 }
1151}
1152
1153static int event_hist_trigger_print(struct seq_file *m,
1154 struct event_trigger_ops *ops,
1155 struct event_trigger_data *data)
1156{
1157 struct hist_trigger_data *hist_data = data->private_data;
1158 struct hist_field *key_field;
1159 unsigned int i;
1160
1161 seq_puts(m, "hist:");
1162
1163 if (data->name)
1164 seq_printf(m, "%s:", data->name);
1165
1166 seq_puts(m, "keys=");
1167
1168 for_each_hist_key_field(i, hist_data) {
1169 key_field = hist_data->fields[i];
1170
1171 if (i > hist_data->n_vals)
1172 seq_puts(m, ",");
1173
1174 if (key_field->flags & HIST_FIELD_FL_STACKTRACE)
1175 seq_puts(m, "stacktrace");
1176 else
1177 hist_field_print(m, key_field);
1178 }
1179
1180 seq_puts(m, ":vals=");
1181
1182 for_each_hist_val_field(i, hist_data) {
1183 if (i == HITCOUNT_IDX)
1184 seq_puts(m, "hitcount");
1185 else {
1186 seq_puts(m, ",");
1187 hist_field_print(m, hist_data->fields[i]);
1188 }
1189 }
1190
1191 seq_puts(m, ":sort=");
1192
1193 for (i = 0; i < hist_data->n_sort_keys; i++) {
1194 struct tracing_map_sort_key *sort_key;
1195
1196 sort_key = &hist_data->sort_keys[i];
1197
1198 if (i > 0)
1199 seq_puts(m, ",");
1200
1201 if (sort_key->field_idx == HITCOUNT_IDX)
1202 seq_puts(m, "hitcount");
1203 else {
1204 unsigned int idx = sort_key->field_idx;
1205
1206 if (WARN_ON(idx >= TRACING_MAP_FIELDS_MAX))
1207 return -EINVAL;
1208
1209 hist_field_print(m, hist_data->fields[idx]);
1210 }
1211
1212 if (sort_key->descending)
1213 seq_puts(m, ".descending");
1214 }
1215
1216 seq_printf(m, ":size=%u", (1 << hist_data->map->map_bits));
1217
1218 if (data->filter_str)
1219 seq_printf(m, " if %s", data->filter_str);
1220
1221 if (data->paused)
1222 seq_puts(m, " [paused]");
1223 else
1224 seq_puts(m, " [active]");
1225
1226 seq_putc(m, '\n');
1227
1228 return 0;
1229}
1230
1231static int event_hist_trigger_init(struct event_trigger_ops *ops,
1232 struct event_trigger_data *data)
1233{
1234 struct hist_trigger_data *hist_data = data->private_data;
1235
1236 if (!data->ref && hist_data->attrs->name)
1237 save_named_trigger(hist_data->attrs->name, data);
1238
1239 data->ref++;
1240
1241 return 0;
1242}
1243
1244static void event_hist_trigger_free(struct event_trigger_ops *ops,
1245 struct event_trigger_data *data)
1246{
1247 struct hist_trigger_data *hist_data = data->private_data;
1248
1249 if (WARN_ON_ONCE(data->ref <= 0))
1250 return;
1251
1252 data->ref--;
1253 if (!data->ref) {
1254 if (data->name)
1255 del_named_trigger(data);
1256 trigger_data_free(data);
1257 destroy_hist_data(hist_data);
1258 }
1259}
1260
1261static struct event_trigger_ops event_hist_trigger_ops = {
1262 .func = event_hist_trigger,
1263 .print = event_hist_trigger_print,
1264 .init = event_hist_trigger_init,
1265 .free = event_hist_trigger_free,
1266};
1267
1268static int event_hist_trigger_named_init(struct event_trigger_ops *ops,
1269 struct event_trigger_data *data)
1270{
1271 data->ref++;
1272
1273 save_named_trigger(data->named_data->name, data);
1274
1275 event_hist_trigger_init(ops, data->named_data);
1276
1277 return 0;
1278}
1279
1280static void event_hist_trigger_named_free(struct event_trigger_ops *ops,
1281 struct event_trigger_data *data)
1282{
1283 if (WARN_ON_ONCE(data->ref <= 0))
1284 return;
1285
1286 event_hist_trigger_free(ops, data->named_data);
1287
1288 data->ref--;
1289 if (!data->ref) {
1290 del_named_trigger(data);
1291 trigger_data_free(data);
1292 }
1293}
1294
1295static struct event_trigger_ops event_hist_trigger_named_ops = {
1296 .func = event_hist_trigger,
1297 .print = event_hist_trigger_print,
1298 .init = event_hist_trigger_named_init,
1299 .free = event_hist_trigger_named_free,
1300};
1301
1302static struct event_trigger_ops *event_hist_get_trigger_ops(char *cmd,
1303 char *param)
1304{
1305 return &event_hist_trigger_ops;
1306}
1307
1308static void hist_clear(struct event_trigger_data *data)
1309{
1310 struct hist_trigger_data *hist_data = data->private_data;
1311
1312 if (data->name)
1313 pause_named_trigger(data);
1314
1315 synchronize_sched();
1316
1317 tracing_map_clear(hist_data->map);
1318
1319 if (data->name)
1320 unpause_named_trigger(data);
1321}
1322
1323static bool compatible_field(struct ftrace_event_field *field,
1324 struct ftrace_event_field *test_field)
1325{
1326 if (field == test_field)
1327 return true;
1328 if (field == NULL || test_field == NULL)
1329 return false;
1330 if (strcmp(field->name, test_field->name) != 0)
1331 return false;
1332 if (strcmp(field->type, test_field->type) != 0)
1333 return false;
1334 if (field->size != test_field->size)
1335 return false;
1336 if (field->is_signed != test_field->is_signed)
1337 return false;
1338
1339 return true;
1340}
1341
1342static bool hist_trigger_match(struct event_trigger_data *data,
1343 struct event_trigger_data *data_test,
1344 struct event_trigger_data *named_data,
1345 bool ignore_filter)
1346{
1347 struct tracing_map_sort_key *sort_key, *sort_key_test;
1348 struct hist_trigger_data *hist_data, *hist_data_test;
1349 struct hist_field *key_field, *key_field_test;
1350 unsigned int i;
1351
1352 if (named_data && (named_data != data_test) &&
1353 (named_data != data_test->named_data))
1354 return false;
1355
1356 if (!named_data && is_named_trigger(data_test))
1357 return false;
1358
1359 hist_data = data->private_data;
1360 hist_data_test = data_test->private_data;
1361
1362 if (hist_data->n_vals != hist_data_test->n_vals ||
1363 hist_data->n_fields != hist_data_test->n_fields ||
1364 hist_data->n_sort_keys != hist_data_test->n_sort_keys)
1365 return false;
1366
1367 if (!ignore_filter) {
1368 if ((data->filter_str && !data_test->filter_str) ||
1369 (!data->filter_str && data_test->filter_str))
1370 return false;
1371 }
1372
1373 for_each_hist_field(i, hist_data) {
1374 key_field = hist_data->fields[i];
1375 key_field_test = hist_data_test->fields[i];
1376
1377 if (key_field->flags != key_field_test->flags)
1378 return false;
1379 if (!compatible_field(key_field->field, key_field_test->field))
1380 return false;
1381 if (key_field->offset != key_field_test->offset)
1382 return false;
1383 }
1384
1385 for (i = 0; i < hist_data->n_sort_keys; i++) {
1386 sort_key = &hist_data->sort_keys[i];
1387 sort_key_test = &hist_data_test->sort_keys[i];
1388
1389 if (sort_key->field_idx != sort_key_test->field_idx ||
1390 sort_key->descending != sort_key_test->descending)
1391 return false;
1392 }
1393
1394 if (!ignore_filter && data->filter_str &&
1395 (strcmp(data->filter_str, data_test->filter_str) != 0))
1396 return false;
1397
1398 return true;
1399}
1400
1401static int hist_register_trigger(char *glob, struct event_trigger_ops *ops,
1402 struct event_trigger_data *data,
1403 struct trace_event_file *file)
1404{
1405 struct hist_trigger_data *hist_data = data->private_data;
1406 struct event_trigger_data *test, *named_data = NULL;
1407 int ret = 0;
1408
1409 if (hist_data->attrs->name) {
1410 named_data = find_named_trigger(hist_data->attrs->name);
1411 if (named_data) {
1412 if (!hist_trigger_match(data, named_data, named_data,
1413 true)) {
1414 ret = -EINVAL;
1415 goto out;
1416 }
1417 }
1418 }
1419
1420 if (hist_data->attrs->name && !named_data)
1421 goto new;
1422
1423 list_for_each_entry_rcu(test, &file->triggers, list) {
1424 if (test->cmd_ops->trigger_type == ETT_EVENT_HIST) {
1425 if (!hist_trigger_match(data, test, named_data, false))
1426 continue;
1427 if (hist_data->attrs->pause)
1428 test->paused = true;
1429 else if (hist_data->attrs->cont)
1430 test->paused = false;
1431 else if (hist_data->attrs->clear)
1432 hist_clear(test);
1433 else
1434 ret = -EEXIST;
1435 goto out;
1436 }
1437 }
1438 new:
1439 if (hist_data->attrs->cont || hist_data->attrs->clear) {
1440 ret = -ENOENT;
1441 goto out;
1442 }
1443
1444 if (named_data) {
1445 destroy_hist_data(data->private_data);
1446 data->private_data = named_data->private_data;
1447 set_named_trigger_data(data, named_data);
1448 data->ops = &event_hist_trigger_named_ops;
1449 }
1450
1451 if (hist_data->attrs->pause)
1452 data->paused = true;
1453
1454 if (data->ops->init) {
1455 ret = data->ops->init(data->ops, data);
1456 if (ret < 0)
1457 goto out;
1458 }
1459
1460 list_add_rcu(&data->list, &file->triggers);
1461 ret++;
1462
1463 update_cond_flag(file);
1464
1465 if (trace_event_trigger_enable_disable(file, 1) < 0) {
1466 list_del_rcu(&data->list);
1467 update_cond_flag(file);
1468 ret--;
1469 }
1470 out:
1471 return ret;
1472}
1473
1474static void hist_unregister_trigger(char *glob, struct event_trigger_ops *ops,
1475 struct event_trigger_data *data,
1476 struct trace_event_file *file)
1477{
1478 struct hist_trigger_data *hist_data = data->private_data;
1479 struct event_trigger_data *test, *named_data = NULL;
1480 bool unregistered = false;
1481
1482 if (hist_data->attrs->name)
1483 named_data = find_named_trigger(hist_data->attrs->name);
1484
1485 list_for_each_entry_rcu(test, &file->triggers, list) {
1486 if (test->cmd_ops->trigger_type == ETT_EVENT_HIST) {
1487 if (!hist_trigger_match(data, test, named_data, false))
1488 continue;
1489 unregistered = true;
1490 list_del_rcu(&test->list);
1491 trace_event_trigger_enable_disable(file, 0);
1492 update_cond_flag(file);
1493 break;
1494 }
1495 }
1496
1497 if (unregistered && test->ops->free)
1498 test->ops->free(test->ops, test);
1499}
1500
1501static void hist_unreg_all(struct trace_event_file *file)
1502{
1503 struct event_trigger_data *test;
1504
1505 list_for_each_entry_rcu(test, &file->triggers, list) {
1506 if (test->cmd_ops->trigger_type == ETT_EVENT_HIST) {
1507 list_del_rcu(&test->list);
1508 trace_event_trigger_enable_disable(file, 0);
1509 update_cond_flag(file);
1510 if (test->ops->free)
1511 test->ops->free(test->ops, test);
1512 }
1513 }
1514}
1515
1516static int event_hist_trigger_func(struct event_command *cmd_ops,
1517 struct trace_event_file *file,
1518 char *glob, char *cmd, char *param)
1519{
1520 unsigned int hist_trigger_bits = TRACING_MAP_BITS_DEFAULT;
1521 struct event_trigger_data *trigger_data;
1522 struct hist_trigger_attrs *attrs;
1523 struct event_trigger_ops *trigger_ops;
1524 struct hist_trigger_data *hist_data;
1525 char *trigger;
1526 int ret = 0;
1527
1528 if (!param)
1529 return -EINVAL;
1530
1531 /* separate the trigger from the filter (k:v [if filter]) */
1532 trigger = strsep(&param, " \t");
1533 if (!trigger)
1534 return -EINVAL;
1535
1536 attrs = parse_hist_trigger_attrs(trigger);
1537 if (IS_ERR(attrs))
1538 return PTR_ERR(attrs);
1539
1540 if (attrs->map_bits)
1541 hist_trigger_bits = attrs->map_bits;
1542
1543 hist_data = create_hist_data(hist_trigger_bits, attrs, file);
1544 if (IS_ERR(hist_data)) {
1545 destroy_hist_trigger_attrs(attrs);
1546 return PTR_ERR(hist_data);
1547 }
1548
1549 trigger_ops = cmd_ops->get_trigger_ops(cmd, trigger);
1550
1551 ret = -ENOMEM;
1552 trigger_data = kzalloc(sizeof(*trigger_data), GFP_KERNEL);
1553 if (!trigger_data)
1554 goto out_free;
1555
1556 trigger_data->count = -1;
1557 trigger_data->ops = trigger_ops;
1558 trigger_data->cmd_ops = cmd_ops;
1559
1560 INIT_LIST_HEAD(&trigger_data->list);
1561 RCU_INIT_POINTER(trigger_data->filter, NULL);
1562
1563 trigger_data->private_data = hist_data;
1564
1565 /* if param is non-empty, it's supposed to be a filter */
1566 if (param && cmd_ops->set_filter) {
1567 ret = cmd_ops->set_filter(param, trigger_data, file);
1568 if (ret < 0)
1569 goto out_free;
1570 }
1571
1572 if (glob[0] == '!') {
1573 cmd_ops->unreg(glob+1, trigger_ops, trigger_data, file);
1574 ret = 0;
1575 goto out_free;
1576 }
1577
1578 ret = cmd_ops->reg(glob, trigger_ops, trigger_data, file);
1579 /*
1580 * The above returns on success the # of triggers registered,
1581 * but if it didn't register any it returns zero. Consider no
1582 * triggers registered a failure too.
1583 */
1584 if (!ret) {
1585 if (!(attrs->pause || attrs->cont || attrs->clear))
1586 ret = -ENOENT;
1587 goto out_free;
1588 } else if (ret < 0)
1589 goto out_free;
1590 /* Just return zero, not the number of registered triggers */
1591 ret = 0;
1592 out:
1593 return ret;
1594 out_free:
1595 if (cmd_ops->set_filter)
1596 cmd_ops->set_filter(NULL, trigger_data, NULL);
1597
1598 kfree(trigger_data);
1599
1600 destroy_hist_data(hist_data);
1601 goto out;
1602}
1603
1604static struct event_command trigger_hist_cmd = {
1605 .name = "hist",
1606 .trigger_type = ETT_EVENT_HIST,
1607 .flags = EVENT_CMD_FL_NEEDS_REC,
1608 .func = event_hist_trigger_func,
1609 .reg = hist_register_trigger,
1610 .unreg = hist_unregister_trigger,
1611 .unreg_all = hist_unreg_all,
1612 .get_trigger_ops = event_hist_get_trigger_ops,
1613 .set_filter = set_trigger_filter,
1614};
1615
1616__init int register_trigger_hist_cmd(void)
1617{
1618 int ret;
1619
1620 ret = register_event_command(&trigger_hist_cmd);
1621 WARN_ON(ret < 0);
1622
1623 return ret;
1624}
1625
1626static void
1627hist_enable_trigger(struct event_trigger_data *data, void *rec)
1628{
1629 struct enable_trigger_data *enable_data = data->private_data;
1630 struct event_trigger_data *test;
1631
1632 list_for_each_entry_rcu(test, &enable_data->file->triggers, list) {
1633 if (test->cmd_ops->trigger_type == ETT_EVENT_HIST) {
1634 if (enable_data->enable)
1635 test->paused = false;
1636 else
1637 test->paused = true;
1638 }
1639 }
1640}
1641
1642static void
1643hist_enable_count_trigger(struct event_trigger_data *data, void *rec)
1644{
1645 if (!data->count)
1646 return;
1647
1648 if (data->count != -1)
1649 (data->count)--;
1650
1651 hist_enable_trigger(data, rec);
1652}
1653
1654static struct event_trigger_ops hist_enable_trigger_ops = {
1655 .func = hist_enable_trigger,
1656 .print = event_enable_trigger_print,
1657 .init = event_trigger_init,
1658 .free = event_enable_trigger_free,
1659};
1660
1661static struct event_trigger_ops hist_enable_count_trigger_ops = {
1662 .func = hist_enable_count_trigger,
1663 .print = event_enable_trigger_print,
1664 .init = event_trigger_init,
1665 .free = event_enable_trigger_free,
1666};
1667
1668static struct event_trigger_ops hist_disable_trigger_ops = {
1669 .func = hist_enable_trigger,
1670 .print = event_enable_trigger_print,
1671 .init = event_trigger_init,
1672 .free = event_enable_trigger_free,
1673};
1674
1675static struct event_trigger_ops hist_disable_count_trigger_ops = {
1676 .func = hist_enable_count_trigger,
1677 .print = event_enable_trigger_print,
1678 .init = event_trigger_init,
1679 .free = event_enable_trigger_free,
1680};
1681
1682static struct event_trigger_ops *
1683hist_enable_get_trigger_ops(char *cmd, char *param)
1684{
1685 struct event_trigger_ops *ops;
1686 bool enable;
1687
1688 enable = (strcmp(cmd, ENABLE_HIST_STR) == 0);
1689
1690 if (enable)
1691 ops = param ? &hist_enable_count_trigger_ops :
1692 &hist_enable_trigger_ops;
1693 else
1694 ops = param ? &hist_disable_count_trigger_ops :
1695 &hist_disable_trigger_ops;
1696
1697 return ops;
1698}
1699
1700static void hist_enable_unreg_all(struct trace_event_file *file)
1701{
1702 struct event_trigger_data *test;
1703
1704 list_for_each_entry_rcu(test, &file->triggers, list) {
1705 if (test->cmd_ops->trigger_type == ETT_HIST_ENABLE) {
1706 list_del_rcu(&test->list);
1707 update_cond_flag(file);
1708 trace_event_trigger_enable_disable(file, 0);
1709 if (test->ops->free)
1710 test->ops->free(test->ops, test);
1711 }
1712 }
1713}
1714
1715static struct event_command trigger_hist_enable_cmd = {
1716 .name = ENABLE_HIST_STR,
1717 .trigger_type = ETT_HIST_ENABLE,
1718 .func = event_enable_trigger_func,
1719 .reg = event_enable_register_trigger,
1720 .unreg = event_enable_unregister_trigger,
1721 .unreg_all = hist_enable_unreg_all,
1722 .get_trigger_ops = hist_enable_get_trigger_ops,
1723 .set_filter = set_trigger_filter,
1724};
1725
1726static struct event_command trigger_hist_disable_cmd = {
1727 .name = DISABLE_HIST_STR,
1728 .trigger_type = ETT_HIST_ENABLE,
1729 .func = event_enable_trigger_func,
1730 .reg = event_enable_register_trigger,
1731 .unreg = event_enable_unregister_trigger,
1732 .unreg_all = hist_enable_unreg_all,
1733 .get_trigger_ops = hist_enable_get_trigger_ops,
1734 .set_filter = set_trigger_filter,
1735};
1736
1737static __init void unregister_trigger_hist_enable_disable_cmds(void)
1738{
1739 unregister_event_command(&trigger_hist_enable_cmd);
1740 unregister_event_command(&trigger_hist_disable_cmd);
1741}
1742
1743__init int register_trigger_hist_enable_disable_cmds(void)
1744{
1745 int ret;
1746
1747 ret = register_event_command(&trigger_hist_enable_cmd);
1748 if (WARN_ON(ret < 0))
1749 return ret;
1750 ret = register_event_command(&trigger_hist_disable_cmd);
1751 if (WARN_ON(ret < 0))
1752 unregister_trigger_hist_enable_disable_cmds();
1753
1754 return ret;
1755}
diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
index d67992f3bb0e..a975571cde24 100644
--- a/kernel/trace/trace_events_trigger.c
+++ b/kernel/trace/trace_events_trigger.c
@@ -347,7 +347,7 @@ __init int register_event_command(struct event_command *cmd)
347 * Currently we only unregister event commands from __init, so mark 347 * Currently we only unregister event commands from __init, so mark
348 * this __init too. 348 * this __init too.
349 */ 349 */
350static __init int unregister_event_command(struct event_command *cmd) 350__init int unregister_event_command(struct event_command *cmd)
351{ 351{
352 struct event_command *p, *n; 352 struct event_command *p, *n;
353 int ret = -ENODEV; 353 int ret = -ENODEV;
@@ -641,6 +641,7 @@ event_trigger_callback(struct event_command *cmd_ops,
641 trigger_data->ops = trigger_ops; 641 trigger_data->ops = trigger_ops;
642 trigger_data->cmd_ops = cmd_ops; 642 trigger_data->cmd_ops = cmd_ops;
643 INIT_LIST_HEAD(&trigger_data->list); 643 INIT_LIST_HEAD(&trigger_data->list);
644 INIT_LIST_HEAD(&trigger_data->named_list);
644 645
645 if (glob[0] == '!') { 646 if (glob[0] == '!') {
646 cmd_ops->unreg(glob+1, trigger_ops, trigger_data, file); 647 cmd_ops->unreg(glob+1, trigger_ops, trigger_data, file);
@@ -764,6 +765,148 @@ int set_trigger_filter(char *filter_str,
764 return ret; 765 return ret;
765} 766}
766 767
768static LIST_HEAD(named_triggers);
769
770/**
771 * find_named_trigger - Find the common named trigger associated with @name
772 * @name: The name of the set of named triggers to find the common data for
773 *
774 * Named triggers are sets of triggers that share a common set of
775 * trigger data. The first named trigger registered with a given name
776 * owns the common trigger data that the others subsequently
777 * registered with the same name will reference. This function
778 * returns the common trigger data associated with that first
779 * registered instance.
780 *
781 * Return: the common trigger data for the given named trigger on
782 * success, NULL otherwise.
783 */
784struct event_trigger_data *find_named_trigger(const char *name)
785{
786 struct event_trigger_data *data;
787
788 if (!name)
789 return NULL;
790
791 list_for_each_entry(data, &named_triggers, named_list) {
792 if (data->named_data)
793 continue;
794 if (strcmp(data->name, name) == 0)
795 return data;
796 }
797
798 return NULL;
799}
800
801/**
802 * is_named_trigger - determine if a given trigger is a named trigger
803 * @test: The trigger data to test
804 *
805 * Return: true if 'test' is a named trigger, false otherwise.
806 */
807bool is_named_trigger(struct event_trigger_data *test)
808{
809 struct event_trigger_data *data;
810
811 list_for_each_entry(data, &named_triggers, named_list) {
812 if (test == data)
813 return true;
814 }
815
816 return false;
817}
818
819/**
820 * save_named_trigger - save the trigger in the named trigger list
821 * @name: The name of the named trigger set
822 * @data: The trigger data to save
823 *
824 * Return: 0 if successful, negative error otherwise.
825 */
826int save_named_trigger(const char *name, struct event_trigger_data *data)
827{
828 data->name = kstrdup(name, GFP_KERNEL);
829 if (!data->name)
830 return -ENOMEM;
831
832 list_add(&data->named_list, &named_triggers);
833
834 return 0;
835}
836
837/**
838 * del_named_trigger - delete a trigger from the named trigger list
839 * @data: The trigger data to delete
840 */
841void del_named_trigger(struct event_trigger_data *data)
842{
843 kfree(data->name);
844 data->name = NULL;
845
846 list_del(&data->named_list);
847}
848
849static void __pause_named_trigger(struct event_trigger_data *data, bool pause)
850{
851 struct event_trigger_data *test;
852
853 list_for_each_entry(test, &named_triggers, named_list) {
854 if (strcmp(test->name, data->name) == 0) {
855 if (pause) {
856 test->paused_tmp = test->paused;
857 test->paused = true;
858 } else {
859 test->paused = test->paused_tmp;
860 }
861 }
862 }
863}
864
865/**
866 * pause_named_trigger - Pause all named triggers with the same name
867 * @data: The trigger data of a named trigger to pause
868 *
869 * Pauses a named trigger along with all other triggers having the
870 * same name. Because named triggers share a common set of data,
871 * pausing only one is meaningless, so pausing one named trigger needs
872 * to pause all triggers with the same name.
873 */
874void pause_named_trigger(struct event_trigger_data *data)
875{
876 __pause_named_trigger(data, true);
877}
878
879/**
880 * unpause_named_trigger - Un-pause all named triggers with the same name
881 * @data: The trigger data of a named trigger to unpause
882 *
883 * Un-pauses a named trigger along with all other triggers having the
884 * same name. Because named triggers share a common set of data,
885 * unpausing only one is meaningless, so unpausing one named trigger
886 * needs to unpause all triggers with the same name.
887 */
888void unpause_named_trigger(struct event_trigger_data *data)
889{
890 __pause_named_trigger(data, false);
891}
892
893/**
894 * set_named_trigger_data - Associate common named trigger data
895 * @data: The trigger data of a named trigger to unpause
896 *
897 * Named triggers are sets of triggers that share a common set of
898 * trigger data. The first named trigger registered with a given name
899 * owns the common trigger data that the others subsequently
900 * registered with the same name will reference. This function
901 * associates the common trigger data from the first trigger with the
902 * given trigger.
903 */
904void set_named_trigger_data(struct event_trigger_data *data,
905 struct event_trigger_data *named_data)
906{
907 data->named_data = named_data;
908}
909
767static void 910static void
768traceon_trigger(struct event_trigger_data *data, void *rec) 911traceon_trigger(struct event_trigger_data *data, void *rec)
769{ 912{
@@ -1062,15 +1205,6 @@ static __init void unregister_trigger_traceon_traceoff_cmds(void)
1062 unregister_event_command(&trigger_traceoff_cmd); 1205 unregister_event_command(&trigger_traceoff_cmd);
1063} 1206}
1064 1207
1065/* Avoid typos */
1066#define ENABLE_EVENT_STR "enable_event"
1067#define DISABLE_EVENT_STR "disable_event"
1068
1069struct enable_trigger_data {
1070 struct trace_event_file *file;
1071 bool enable;
1072};
1073
1074static void 1208static void
1075event_enable_trigger(struct event_trigger_data *data, void *rec) 1209event_enable_trigger(struct event_trigger_data *data, void *rec)
1076{ 1210{
@@ -1100,14 +1234,16 @@ event_enable_count_trigger(struct event_trigger_data *data, void *rec)
1100 event_enable_trigger(data, rec); 1234 event_enable_trigger(data, rec);
1101} 1235}
1102 1236
1103static int 1237int event_enable_trigger_print(struct seq_file *m,
1104event_enable_trigger_print(struct seq_file *m, struct event_trigger_ops *ops, 1238 struct event_trigger_ops *ops,
1105 struct event_trigger_data *data) 1239 struct event_trigger_data *data)
1106{ 1240{
1107 struct enable_trigger_data *enable_data = data->private_data; 1241 struct enable_trigger_data *enable_data = data->private_data;
1108 1242
1109 seq_printf(m, "%s:%s:%s", 1243 seq_printf(m, "%s:%s:%s",
1110 enable_data->enable ? ENABLE_EVENT_STR : DISABLE_EVENT_STR, 1244 enable_data->hist ?
1245 (enable_data->enable ? ENABLE_HIST_STR : DISABLE_HIST_STR) :
1246 (enable_data->enable ? ENABLE_EVENT_STR : DISABLE_EVENT_STR),
1111 enable_data->file->event_call->class->system, 1247 enable_data->file->event_call->class->system,
1112 trace_event_name(enable_data->file->event_call)); 1248 trace_event_name(enable_data->file->event_call));
1113 1249
@@ -1124,9 +1260,8 @@ event_enable_trigger_print(struct seq_file *m, struct event_trigger_ops *ops,
1124 return 0; 1260 return 0;
1125} 1261}
1126 1262
1127static void 1263void event_enable_trigger_free(struct event_trigger_ops *ops,
1128event_enable_trigger_free(struct event_trigger_ops *ops, 1264 struct event_trigger_data *data)
1129 struct event_trigger_data *data)
1130{ 1265{
1131 struct enable_trigger_data *enable_data = data->private_data; 1266 struct enable_trigger_data *enable_data = data->private_data;
1132 1267
@@ -1171,10 +1306,9 @@ static struct event_trigger_ops event_disable_count_trigger_ops = {
1171 .free = event_enable_trigger_free, 1306 .free = event_enable_trigger_free,
1172}; 1307};
1173 1308
1174static int 1309int event_enable_trigger_func(struct event_command *cmd_ops,
1175event_enable_trigger_func(struct event_command *cmd_ops, 1310 struct trace_event_file *file,
1176 struct trace_event_file *file, 1311 char *glob, char *cmd, char *param)
1177 char *glob, char *cmd, char *param)
1178{ 1312{
1179 struct trace_event_file *event_enable_file; 1313 struct trace_event_file *event_enable_file;
1180 struct enable_trigger_data *enable_data; 1314 struct enable_trigger_data *enable_data;
@@ -1183,6 +1317,7 @@ event_enable_trigger_func(struct event_command *cmd_ops,
1183 struct trace_array *tr = file->tr; 1317 struct trace_array *tr = file->tr;
1184 const char *system; 1318 const char *system;
1185 const char *event; 1319 const char *event;
1320 bool hist = false;
1186 char *trigger; 1321 char *trigger;
1187 char *number; 1322 char *number;
1188 bool enable; 1323 bool enable;
@@ -1207,8 +1342,15 @@ event_enable_trigger_func(struct event_command *cmd_ops,
1207 if (!event_enable_file) 1342 if (!event_enable_file)
1208 goto out; 1343 goto out;
1209 1344
1210 enable = strcmp(cmd, ENABLE_EVENT_STR) == 0; 1345#ifdef CONFIG_HIST_TRIGGERS
1346 hist = ((strcmp(cmd, ENABLE_HIST_STR) == 0) ||
1347 (strcmp(cmd, DISABLE_HIST_STR) == 0));
1211 1348
1349 enable = ((strcmp(cmd, ENABLE_EVENT_STR) == 0) ||
1350 (strcmp(cmd, ENABLE_HIST_STR) == 0));
1351#else
1352 enable = strcmp(cmd, ENABLE_EVENT_STR) == 0;
1353#endif
1212 trigger_ops = cmd_ops->get_trigger_ops(cmd, trigger); 1354 trigger_ops = cmd_ops->get_trigger_ops(cmd, trigger);
1213 1355
1214 ret = -ENOMEM; 1356 ret = -ENOMEM;
@@ -1228,6 +1370,7 @@ event_enable_trigger_func(struct event_command *cmd_ops,
1228 INIT_LIST_HEAD(&trigger_data->list); 1370 INIT_LIST_HEAD(&trigger_data->list);
1229 RCU_INIT_POINTER(trigger_data->filter, NULL); 1371 RCU_INIT_POINTER(trigger_data->filter, NULL);
1230 1372
1373 enable_data->hist = hist;
1231 enable_data->enable = enable; 1374 enable_data->enable = enable;
1232 enable_data->file = event_enable_file; 1375 enable_data->file = event_enable_file;
1233 trigger_data->private_data = enable_data; 1376 trigger_data->private_data = enable_data;
@@ -1305,10 +1448,10 @@ event_enable_trigger_func(struct event_command *cmd_ops,
1305 goto out; 1448 goto out;
1306} 1449}
1307 1450
1308static int event_enable_register_trigger(char *glob, 1451int event_enable_register_trigger(char *glob,
1309 struct event_trigger_ops *ops, 1452 struct event_trigger_ops *ops,
1310 struct event_trigger_data *data, 1453 struct event_trigger_data *data,
1311 struct trace_event_file *file) 1454 struct trace_event_file *file)
1312{ 1455{
1313 struct enable_trigger_data *enable_data = data->private_data; 1456 struct enable_trigger_data *enable_data = data->private_data;
1314 struct enable_trigger_data *test_enable_data; 1457 struct enable_trigger_data *test_enable_data;
@@ -1318,6 +1461,8 @@ static int event_enable_register_trigger(char *glob,
1318 list_for_each_entry_rcu(test, &file->triggers, list) { 1461 list_for_each_entry_rcu(test, &file->triggers, list) {
1319 test_enable_data = test->private_data; 1462 test_enable_data = test->private_data;
1320 if (test_enable_data && 1463 if (test_enable_data &&
1464 (test->cmd_ops->trigger_type ==
1465 data->cmd_ops->trigger_type) &&
1321 (test_enable_data->file == enable_data->file)) { 1466 (test_enable_data->file == enable_data->file)) {
1322 ret = -EEXIST; 1467 ret = -EEXIST;
1323 goto out; 1468 goto out;
@@ -1343,10 +1488,10 @@ out:
1343 return ret; 1488 return ret;
1344} 1489}
1345 1490
1346static void event_enable_unregister_trigger(char *glob, 1491void event_enable_unregister_trigger(char *glob,
1347 struct event_trigger_ops *ops, 1492 struct event_trigger_ops *ops,
1348 struct event_trigger_data *test, 1493 struct event_trigger_data *test,
1349 struct trace_event_file *file) 1494 struct trace_event_file *file)
1350{ 1495{
1351 struct enable_trigger_data *test_enable_data = test->private_data; 1496 struct enable_trigger_data *test_enable_data = test->private_data;
1352 struct enable_trigger_data *enable_data; 1497 struct enable_trigger_data *enable_data;
@@ -1356,6 +1501,8 @@ static void event_enable_unregister_trigger(char *glob,
1356 list_for_each_entry_rcu(data, &file->triggers, list) { 1501 list_for_each_entry_rcu(data, &file->triggers, list) {
1357 enable_data = data->private_data; 1502 enable_data = data->private_data;
1358 if (enable_data && 1503 if (enable_data &&
1504 (data->cmd_ops->trigger_type ==
1505 test->cmd_ops->trigger_type) &&
1359 (enable_data->file == test_enable_data->file)) { 1506 (enable_data->file == test_enable_data->file)) {
1360 unregistered = true; 1507 unregistered = true;
1361 list_del_rcu(&data->list); 1508 list_del_rcu(&data->list);
@@ -1375,8 +1522,12 @@ event_enable_get_trigger_ops(char *cmd, char *param)
1375 struct event_trigger_ops *ops; 1522 struct event_trigger_ops *ops;
1376 bool enable; 1523 bool enable;
1377 1524
1525#ifdef CONFIG_HIST_TRIGGERS
1526 enable = ((strcmp(cmd, ENABLE_EVENT_STR) == 0) ||
1527 (strcmp(cmd, ENABLE_HIST_STR) == 0));
1528#else
1378 enable = strcmp(cmd, ENABLE_EVENT_STR) == 0; 1529 enable = strcmp(cmd, ENABLE_EVENT_STR) == 0;
1379 1530#endif
1380 if (enable) 1531 if (enable)
1381 ops = param ? &event_enable_count_trigger_ops : 1532 ops = param ? &event_enable_count_trigger_ops :
1382 &event_enable_trigger_ops; 1533 &event_enable_trigger_ops;
@@ -1447,6 +1598,8 @@ __init int register_trigger_cmds(void)
1447 register_trigger_snapshot_cmd(); 1598 register_trigger_snapshot_cmd();
1448 register_trigger_stacktrace_cmd(); 1599 register_trigger_stacktrace_cmd();
1449 register_trigger_enable_disable_cmds(); 1600 register_trigger_enable_disable_cmds();
1601 register_trigger_hist_enable_disable_cmds();
1602 register_trigger_hist_cmd();
1450 1603
1451 return 0; 1604 return 0;
1452} 1605}
diff --git a/kernel/trace/tracing_map.c b/kernel/trace/tracing_map.c
new file mode 100644
index 000000000000..0a689bbb78ef
--- /dev/null
+++ b/kernel/trace/tracing_map.c
@@ -0,0 +1,1062 @@
1/*
2 * tracing_map - lock-free map for tracing
3 *
4 * This program is free software; you can redistribute it and/or modify
5 * it under the terms of the GNU General Public License as published by
6 * the Free Software Foundation; either version 2 of the License, or
7 * (at your option) any later version.
8 *
9 * This program is distributed in the hope that it will be useful,
10 * but WITHOUT ANY WARRANTY; without even the implied warranty of
11 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
12 * GNU General Public License for more details.
13 *
14 * Copyright (C) 2015 Tom Zanussi <tom.zanussi@linux.intel.com>
15 *
16 * tracing_map implementation inspired by lock-free map algorithms
17 * originated by Dr. Cliff Click:
18 *
19 * http://www.azulsystems.com/blog/cliff/2007-03-26-non-blocking-hashtable
20 * http://www.azulsystems.com/events/javaone_2007/2007_LockFreeHash.pdf
21 */
22
23#include <linux/vmalloc.h>
24#include <linux/jhash.h>
25#include <linux/slab.h>
26#include <linux/sort.h>
27
28#include "tracing_map.h"
29#include "trace.h"
30
31/*
32 * NOTE: For a detailed description of the data structures used by
33 * these functions (such as tracing_map_elt) please see the overview
34 * of tracing_map data structures at the beginning of tracing_map.h.
35 */
36
37/**
38 * tracing_map_update_sum - Add a value to a tracing_map_elt's sum field
39 * @elt: The tracing_map_elt
40 * @i: The index of the given sum associated with the tracing_map_elt
41 * @n: The value to add to the sum
42 *
43 * Add n to sum i associated with the specified tracing_map_elt
44 * instance. The index i is the index returned by the call to
45 * tracing_map_add_sum_field() when the tracing map was set up.
46 */
47void tracing_map_update_sum(struct tracing_map_elt *elt, unsigned int i, u64 n)
48{
49 atomic64_add(n, &elt->fields[i].sum);
50}
51
52/**
53 * tracing_map_read_sum - Return the value of a tracing_map_elt's sum field
54 * @elt: The tracing_map_elt
55 * @i: The index of the given sum associated with the tracing_map_elt
56 *
57 * Retrieve the value of the sum i associated with the specified
58 * tracing_map_elt instance. The index i is the index returned by the
59 * call to tracing_map_add_sum_field() when the tracing map was set
60 * up.
61 *
62 * Return: The sum associated with field i for elt.
63 */
64u64 tracing_map_read_sum(struct tracing_map_elt *elt, unsigned int i)
65{
66 return (u64)atomic64_read(&elt->fields[i].sum);
67}
68
69int tracing_map_cmp_string(void *val_a, void *val_b)
70{
71 char *a = val_a;
72 char *b = val_b;
73
74 return strcmp(a, b);
75}
76
77int tracing_map_cmp_none(void *val_a, void *val_b)
78{
79 return 0;
80}
81
82static int tracing_map_cmp_atomic64(void *val_a, void *val_b)
83{
84 u64 a = atomic64_read((atomic64_t *)val_a);
85 u64 b = atomic64_read((atomic64_t *)val_b);
86
87 return (a > b) ? 1 : ((a < b) ? -1 : 0);
88}
89
90#define DEFINE_TRACING_MAP_CMP_FN(type) \
91static int tracing_map_cmp_##type(void *val_a, void *val_b) \
92{ \
93 type a = *(type *)val_a; \
94 type b = *(type *)val_b; \
95 \
96 return (a > b) ? 1 : ((a < b) ? -1 : 0); \
97}
98
99DEFINE_TRACING_MAP_CMP_FN(s64);
100DEFINE_TRACING_MAP_CMP_FN(u64);
101DEFINE_TRACING_MAP_CMP_FN(s32);
102DEFINE_TRACING_MAP_CMP_FN(u32);
103DEFINE_TRACING_MAP_CMP_FN(s16);
104DEFINE_TRACING_MAP_CMP_FN(u16);
105DEFINE_TRACING_MAP_CMP_FN(s8);
106DEFINE_TRACING_MAP_CMP_FN(u8);
107
108tracing_map_cmp_fn_t tracing_map_cmp_num(int field_size,
109 int field_is_signed)
110{
111 tracing_map_cmp_fn_t fn = tracing_map_cmp_none;
112
113 switch (field_size) {
114 case 8:
115 if (field_is_signed)
116 fn = tracing_map_cmp_s64;
117 else
118 fn = tracing_map_cmp_u64;
119 break;
120 case 4:
121 if (field_is_signed)
122 fn = tracing_map_cmp_s32;
123 else
124 fn = tracing_map_cmp_u32;
125 break;
126 case 2:
127 if (field_is_signed)
128 fn = tracing_map_cmp_s16;
129 else
130 fn = tracing_map_cmp_u16;
131 break;
132 case 1:
133 if (field_is_signed)
134 fn = tracing_map_cmp_s8;
135 else
136 fn = tracing_map_cmp_u8;
137 break;
138 }
139
140 return fn;
141}
142
143static int tracing_map_add_field(struct tracing_map *map,
144 tracing_map_cmp_fn_t cmp_fn)
145{
146 int ret = -EINVAL;
147
148 if (map->n_fields < TRACING_MAP_FIELDS_MAX) {
149 ret = map->n_fields;
150 map->fields[map->n_fields++].cmp_fn = cmp_fn;
151 }
152
153 return ret;
154}
155
156/**
157 * tracing_map_add_sum_field - Add a field describing a tracing_map sum
158 * @map: The tracing_map
159 *
160 * Add a sum field to the key and return the index identifying it in
161 * the map and associated tracing_map_elts. This is the index used
162 * for instance to update a sum for a particular tracing_map_elt using
163 * tracing_map_update_sum() or reading it via tracing_map_read_sum().
164 *
165 * Return: The index identifying the field in the map and associated
166 * tracing_map_elts, or -EINVAL on error.
167 */
168int tracing_map_add_sum_field(struct tracing_map *map)
169{
170 return tracing_map_add_field(map, tracing_map_cmp_atomic64);
171}
172
173/**
174 * tracing_map_add_key_field - Add a field describing a tracing_map key
175 * @map: The tracing_map
176 * @offset: The offset within the key
177 * @cmp_fn: The comparison function that will be used to sort on the key
178 *
179 * Let the map know there is a key and that if it's used as a sort key
180 * to use cmp_fn.
181 *
182 * A key can be a subset of a compound key; for that purpose, the
183 * offset param is used to describe where within the the compound key
184 * the key referenced by this key field resides.
185 *
186 * Return: The index identifying the field in the map and associated
187 * tracing_map_elts, or -EINVAL on error.
188 */
189int tracing_map_add_key_field(struct tracing_map *map,
190 unsigned int offset,
191 tracing_map_cmp_fn_t cmp_fn)
192
193{
194 int idx = tracing_map_add_field(map, cmp_fn);
195
196 if (idx < 0)
197 return idx;
198
199 map->fields[idx].offset = offset;
200
201 map->key_idx[map->n_keys++] = idx;
202
203 return idx;
204}
205
206void tracing_map_array_clear(struct tracing_map_array *a)
207{
208 unsigned int i;
209
210 if (!a->pages)
211 return;
212
213 for (i = 0; i < a->n_pages; i++)
214 memset(a->pages[i], 0, PAGE_SIZE);
215}
216
217void tracing_map_array_free(struct tracing_map_array *a)
218{
219 unsigned int i;
220
221 if (!a)
222 return;
223
224 if (!a->pages) {
225 kfree(a);
226 return;
227 }
228
229 for (i = 0; i < a->n_pages; i++) {
230 if (!a->pages[i])
231 break;
232 free_page((unsigned long)a->pages[i]);
233 }
234}
235
236struct tracing_map_array *tracing_map_array_alloc(unsigned int n_elts,
237 unsigned int entry_size)
238{
239 struct tracing_map_array *a;
240 unsigned int i;
241
242 a = kzalloc(sizeof(*a), GFP_KERNEL);
243 if (!a)
244 return NULL;
245
246 a->entry_size_shift = fls(roundup_pow_of_two(entry_size) - 1);
247 a->entries_per_page = PAGE_SIZE / (1 << a->entry_size_shift);
248 a->n_pages = n_elts / a->entries_per_page;
249 if (!a->n_pages)
250 a->n_pages = 1;
251 a->entry_shift = fls(a->entries_per_page) - 1;
252 a->entry_mask = (1 << a->entry_shift) - 1;
253
254 a->pages = kcalloc(a->n_pages, sizeof(void *), GFP_KERNEL);
255 if (!a->pages)
256 goto free;
257
258 for (i = 0; i < a->n_pages; i++) {
259 a->pages[i] = (void *)get_zeroed_page(GFP_KERNEL);
260 if (!a->pages[i])
261 goto free;
262 }
263 out:
264 return a;
265 free:
266 tracing_map_array_free(a);
267 a = NULL;
268
269 goto out;
270}
271
272static void tracing_map_elt_clear(struct tracing_map_elt *elt)
273{
274 unsigned i;
275
276 for (i = 0; i < elt->map->n_fields; i++)
277 if (elt->fields[i].cmp_fn == tracing_map_cmp_atomic64)
278 atomic64_set(&elt->fields[i].sum, 0);
279
280 if (elt->map->ops && elt->map->ops->elt_clear)
281 elt->map->ops->elt_clear(elt);
282}
283
284static void tracing_map_elt_init_fields(struct tracing_map_elt *elt)
285{
286 unsigned int i;
287
288 tracing_map_elt_clear(elt);
289
290 for (i = 0; i < elt->map->n_fields; i++) {
291 elt->fields[i].cmp_fn = elt->map->fields[i].cmp_fn;
292
293 if (elt->fields[i].cmp_fn != tracing_map_cmp_atomic64)
294 elt->fields[i].offset = elt->map->fields[i].offset;
295 }
296}
297
298static void tracing_map_elt_free(struct tracing_map_elt *elt)
299{
300 if (!elt)
301 return;
302
303 if (elt->map->ops && elt->map->ops->elt_free)
304 elt->map->ops->elt_free(elt);
305 kfree(elt->fields);
306 kfree(elt->key);
307 kfree(elt);
308}
309
310static struct tracing_map_elt *tracing_map_elt_alloc(struct tracing_map *map)
311{
312 struct tracing_map_elt *elt;
313 int err = 0;
314
315 elt = kzalloc(sizeof(*elt), GFP_KERNEL);
316 if (!elt)
317 return ERR_PTR(-ENOMEM);
318
319 elt->map = map;
320
321 elt->key = kzalloc(map->key_size, GFP_KERNEL);
322 if (!elt->key) {
323 err = -ENOMEM;
324 goto free;
325 }
326
327 elt->fields = kcalloc(map->n_fields, sizeof(*elt->fields), GFP_KERNEL);
328 if (!elt->fields) {
329 err = -ENOMEM;
330 goto free;
331 }
332
333 tracing_map_elt_init_fields(elt);
334
335 if (map->ops && map->ops->elt_alloc) {
336 err = map->ops->elt_alloc(elt);
337 if (err)
338 goto free;
339 }
340 return elt;
341 free:
342 tracing_map_elt_free(elt);
343
344 return ERR_PTR(err);
345}
346
347static struct tracing_map_elt *get_free_elt(struct tracing_map *map)
348{
349 struct tracing_map_elt *elt = NULL;
350 int idx;
351
352 idx = atomic_inc_return(&map->next_elt);
353 if (idx < map->max_elts) {
354 elt = *(TRACING_MAP_ELT(map->elts, idx));
355 if (map->ops && map->ops->elt_init)
356 map->ops->elt_init(elt);
357 }
358
359 return elt;
360}
361
362static void tracing_map_free_elts(struct tracing_map *map)
363{
364 unsigned int i;
365
366 if (!map->elts)
367 return;
368
369 for (i = 0; i < map->max_elts; i++) {
370 tracing_map_elt_free(*(TRACING_MAP_ELT(map->elts, i)));
371 *(TRACING_MAP_ELT(map->elts, i)) = NULL;
372 }
373
374 tracing_map_array_free(map->elts);
375 map->elts = NULL;
376}
377
378static int tracing_map_alloc_elts(struct tracing_map *map)
379{
380 unsigned int i;
381
382 map->elts = tracing_map_array_alloc(map->max_elts,
383 sizeof(struct tracing_map_elt *));
384 if (!map->elts)
385 return -ENOMEM;
386
387 for (i = 0; i < map->max_elts; i++) {
388 *(TRACING_MAP_ELT(map->elts, i)) = tracing_map_elt_alloc(map);
389 if (IS_ERR(*(TRACING_MAP_ELT(map->elts, i)))) {
390 *(TRACING_MAP_ELT(map->elts, i)) = NULL;
391 tracing_map_free_elts(map);
392
393 return -ENOMEM;
394 }
395 }
396
397 return 0;
398}
399
400static inline bool keys_match(void *key, void *test_key, unsigned key_size)
401{
402 bool match = true;
403
404 if (memcmp(key, test_key, key_size))
405 match = false;
406
407 return match;
408}
409
410static inline struct tracing_map_elt *
411__tracing_map_insert(struct tracing_map *map, void *key, bool lookup_only)
412{
413 u32 idx, key_hash, test_key;
414 struct tracing_map_entry *entry;
415
416 key_hash = jhash(key, map->key_size, 0);
417 if (key_hash == 0)
418 key_hash = 1;
419 idx = key_hash >> (32 - (map->map_bits + 1));
420
421 while (1) {
422 idx &= (map->map_size - 1);
423 entry = TRACING_MAP_ENTRY(map->map, idx);
424 test_key = entry->key;
425
426 if (test_key && test_key == key_hash && entry->val &&
427 keys_match(key, entry->val->key, map->key_size)) {
428 atomic64_inc(&map->hits);
429 return entry->val;
430 }
431
432 if (!test_key) {
433 if (lookup_only)
434 break;
435
436 if (!cmpxchg(&entry->key, 0, key_hash)) {
437 struct tracing_map_elt *elt;
438
439 elt = get_free_elt(map);
440 if (!elt) {
441 atomic64_inc(&map->drops);
442 entry->key = 0;
443 break;
444 }
445
446 memcpy(elt->key, key, map->key_size);
447 entry->val = elt;
448 atomic64_inc(&map->hits);
449
450 return entry->val;
451 }
452 }
453
454 idx++;
455 }
456
457 return NULL;
458}
459
460/**
461 * tracing_map_insert - Insert key and/or retrieve val from a tracing_map
462 * @map: The tracing_map to insert into
463 * @key: The key to insert
464 *
465 * Inserts a key into a tracing_map and creates and returns a new
466 * tracing_map_elt for it, or if the key has already been inserted by
467 * a previous call, returns the tracing_map_elt already associated
468 * with it. When the map was created, the number of elements to be
469 * allocated for the map was specified (internally maintained as
470 * 'max_elts' in struct tracing_map), and that number of
471 * tracing_map_elts was created by tracing_map_init(). This is the
472 * pre-allocated pool of tracing_map_elts that tracing_map_insert()
473 * will allocate from when adding new keys. Once that pool is
474 * exhausted, tracing_map_insert() is useless and will return NULL to
475 * signal that state. There are two user-visible tracing_map
476 * variables, 'hits' and 'drops', which are updated by this function.
477 * Every time an element is either successfully inserted or retrieved,
478 * the 'hits' value is incrememented. Every time an element insertion
479 * fails, the 'drops' value is incremented.
480 *
481 * This is a lock-free tracing map insertion function implementing a
482 * modified form of Cliff Click's basic insertion algorithm. It
483 * requires the table size be a power of two. To prevent any
484 * possibility of an infinite loop we always make the internal table
485 * size double the size of the requested table size (max_elts * 2).
486 * Likewise, we never reuse a slot or resize or delete elements - when
487 * we've reached max_elts entries, we simply return NULL once we've
488 * run out of entries. Readers can at any point in time traverse the
489 * tracing map and safely access the key/val pairs.
490 *
491 * Return: the tracing_map_elt pointer val associated with the key.
492 * If this was a newly inserted key, the val will be a newly allocated
493 * and associated tracing_map_elt pointer val. If the key wasn't
494 * found and the pool of tracing_map_elts has been exhausted, NULL is
495 * returned and no further insertions will succeed.
496 */
497struct tracing_map_elt *tracing_map_insert(struct tracing_map *map, void *key)
498{
499 return __tracing_map_insert(map, key, false);
500}
501
502/**
503 * tracing_map_lookup - Retrieve val from a tracing_map
504 * @map: The tracing_map to perform the lookup on
505 * @key: The key to look up
506 *
507 * Looks up key in tracing_map and if found returns the matching
508 * tracing_map_elt. This is a lock-free lookup; see
509 * tracing_map_insert() for details on tracing_map and how it works.
510 * Every time an element is retrieved, the 'hits' value is
511 * incrememented. There is one user-visible tracing_map variable,
512 * 'hits', which is updated by this function. Every time an element
513 * is successfully retrieved, the 'hits' value is incrememented. The
514 * 'drops' value is never updated by this function.
515 *
516 * Return: the tracing_map_elt pointer val associated with the key.
517 * If the key wasn't found, NULL is returned.
518 */
519struct tracing_map_elt *tracing_map_lookup(struct tracing_map *map, void *key)
520{
521 return __tracing_map_insert(map, key, true);
522}
523
524/**
525 * tracing_map_destroy - Destroy a tracing_map
526 * @map: The tracing_map to destroy
527 *
528 * Frees a tracing_map along with its associated array of
529 * tracing_map_elts.
530 *
531 * Callers should make sure there are no readers or writers actively
532 * reading or inserting into the map before calling this.
533 */
534void tracing_map_destroy(struct tracing_map *map)
535{
536 if (!map)
537 return;
538
539 tracing_map_free_elts(map);
540
541 tracing_map_array_free(map->map);
542 kfree(map);
543}
544
545/**
546 * tracing_map_clear - Clear a tracing_map
547 * @map: The tracing_map to clear
548 *
549 * Resets the tracing map to a cleared or initial state. The
550 * tracing_map_elts are all cleared, and the array of struct
551 * tracing_map_entry is reset to an initialized state.
552 *
553 * Callers should make sure there are no writers actively inserting
554 * into the map before calling this.
555 */
556void tracing_map_clear(struct tracing_map *map)
557{
558 unsigned int i;
559
560 atomic_set(&map->next_elt, -1);
561 atomic64_set(&map->hits, 0);
562 atomic64_set(&map->drops, 0);
563
564 tracing_map_array_clear(map->map);
565
566 for (i = 0; i < map->max_elts; i++)
567 tracing_map_elt_clear(*(TRACING_MAP_ELT(map->elts, i)));
568}
569
570static void set_sort_key(struct tracing_map *map,
571 struct tracing_map_sort_key *sort_key)
572{
573 map->sort_key = *sort_key;
574}
575
576/**
577 * tracing_map_create - Create a lock-free map and element pool
578 * @map_bits: The size of the map (2 ** map_bits)
579 * @key_size: The size of the key for the map in bytes
580 * @ops: Optional client-defined tracing_map_ops instance
581 * @private_data: Client data associated with the map
582 *
583 * Creates and sets up a map to contain 2 ** map_bits number of
584 * elements (internally maintained as 'max_elts' in struct
585 * tracing_map). Before using, map fields should be added to the map
586 * with tracing_map_add_sum_field() and tracing_map_add_key_field().
587 * tracing_map_init() should then be called to allocate the array of
588 * tracing_map_elts, in order to avoid allocating anything in the map
589 * insertion path. The user-specified map size reflects the maximum
590 * number of elements that can be contained in the table requested by
591 * the user - internally we double that in order to keep the table
592 * sparse and keep collisions manageable.
593 *
594 * A tracing_map is a special-purpose map designed to aggregate or
595 * 'sum' one or more values associated with a specific object of type
596 * tracing_map_elt, which is attached by the map to a given key.
597 *
598 * tracing_map_create() sets up the map itself, and provides
599 * operations for inserting tracing_map_elts, but doesn't allocate the
600 * tracing_map_elts themselves, or provide a means for describing the
601 * keys or sums associated with the tracing_map_elts. All
602 * tracing_map_elts for a given map have the same set of sums and
603 * keys, which are defined by the client using the functions
604 * tracing_map_add_key_field() and tracing_map_add_sum_field(). Once
605 * the fields are defined, the pool of elements allocated for the map
606 * can be created, which occurs when the client code calls
607 * tracing_map_init().
608 *
609 * When tracing_map_init() returns, tracing_map_elt elements can be
610 * inserted into the map using tracing_map_insert(). When called,
611 * tracing_map_insert() grabs a free tracing_map_elt from the pool, or
612 * finds an existing match in the map and in either case returns it.
613 * The client can then use tracing_map_update_sum() and
614 * tracing_map_read_sum() to update or read a given sum field for the
615 * tracing_map_elt.
616 *
617 * The client can at any point retrieve and traverse the current set
618 * of inserted tracing_map_elts in a tracing_map, via
619 * tracing_map_sort_entries(). Sorting can be done on any field,
620 * including keys.
621 *
622 * See tracing_map.h for a description of tracing_map_ops.
623 *
624 * Return: the tracing_map pointer if successful, ERR_PTR if not.
625 */
626struct tracing_map *tracing_map_create(unsigned int map_bits,
627 unsigned int key_size,
628 const struct tracing_map_ops *ops,
629 void *private_data)
630{
631 struct tracing_map *map;
632 unsigned int i;
633
634 if (map_bits < TRACING_MAP_BITS_MIN ||
635 map_bits > TRACING_MAP_BITS_MAX)
636 return ERR_PTR(-EINVAL);
637
638 map = kzalloc(sizeof(*map), GFP_KERNEL);
639 if (!map)
640 return ERR_PTR(-ENOMEM);
641
642 map->map_bits = map_bits;
643 map->max_elts = (1 << map_bits);
644 atomic_set(&map->next_elt, -1);
645
646 map->map_size = (1 << (map_bits + 1));
647 map->ops = ops;
648
649 map->private_data = private_data;
650
651 map->map = tracing_map_array_alloc(map->map_size,
652 sizeof(struct tracing_map_entry));
653 if (!map->map)
654 goto free;
655
656 map->key_size = key_size;
657 for (i = 0; i < TRACING_MAP_KEYS_MAX; i++)
658 map->key_idx[i] = -1;
659 out:
660 return map;
661 free:
662 tracing_map_destroy(map);
663 map = ERR_PTR(-ENOMEM);
664
665 goto out;
666}
667
668/**
669 * tracing_map_init - Allocate and clear a map's tracing_map_elts
670 * @map: The tracing_map to initialize
671 *
672 * Allocates a clears a pool of tracing_map_elts equal to the
673 * user-specified size of 2 ** map_bits (internally maintained as
674 * 'max_elts' in struct tracing_map). Before using, the map fields
675 * should be added to the map with tracing_map_add_sum_field() and
676 * tracing_map_add_key_field(). tracing_map_init() should then be
677 * called to allocate the array of tracing_map_elts, in order to avoid
678 * allocating anything in the map insertion path. The user-specified
679 * map size reflects the max number of elements requested by the user
680 * - internally we double that in order to keep the table sparse and
681 * keep collisions manageable.
682 *
683 * See tracing_map.h for a description of tracing_map_ops.
684 *
685 * Return: the tracing_map pointer if successful, ERR_PTR if not.
686 */
687int tracing_map_init(struct tracing_map *map)
688{
689 int err;
690
691 if (map->n_fields < 2)
692 return -EINVAL; /* need at least 1 key and 1 val */
693
694 err = tracing_map_alloc_elts(map);
695 if (err)
696 return err;
697
698 tracing_map_clear(map);
699
700 return err;
701}
702
703static int cmp_entries_dup(const struct tracing_map_sort_entry **a,
704 const struct tracing_map_sort_entry **b)
705{
706 int ret = 0;
707
708 if (memcmp((*a)->key, (*b)->key, (*a)->elt->map->key_size))
709 ret = 1;
710
711 return ret;
712}
713
714static int cmp_entries_sum(const struct tracing_map_sort_entry **a,
715 const struct tracing_map_sort_entry **b)
716{
717 const struct tracing_map_elt *elt_a, *elt_b;
718 struct tracing_map_sort_key *sort_key;
719 struct tracing_map_field *field;
720 tracing_map_cmp_fn_t cmp_fn;
721 void *val_a, *val_b;
722 int ret = 0;
723
724 elt_a = (*a)->elt;
725 elt_b = (*b)->elt;
726
727 sort_key = &elt_a->map->sort_key;
728
729 field = &elt_a->fields[sort_key->field_idx];
730 cmp_fn = field->cmp_fn;
731
732 val_a = &elt_a->fields[sort_key->field_idx].sum;
733 val_b = &elt_b->fields[sort_key->field_idx].sum;
734
735 ret = cmp_fn(val_a, val_b);
736 if (sort_key->descending)
737 ret = -ret;
738
739 return ret;
740}
741
742static int cmp_entries_key(const struct tracing_map_sort_entry **a,
743 const struct tracing_map_sort_entry **b)
744{
745 const struct tracing_map_elt *elt_a, *elt_b;
746 struct tracing_map_sort_key *sort_key;
747 struct tracing_map_field *field;
748 tracing_map_cmp_fn_t cmp_fn;
749 void *val_a, *val_b;
750 int ret = 0;
751
752 elt_a = (*a)->elt;
753 elt_b = (*b)->elt;
754
755 sort_key = &elt_a->map->sort_key;
756
757 field = &elt_a->fields[sort_key->field_idx];
758
759 cmp_fn = field->cmp_fn;
760
761 val_a = elt_a->key + field->offset;
762 val_b = elt_b->key + field->offset;
763
764 ret = cmp_fn(val_a, val_b);
765 if (sort_key->descending)
766 ret = -ret;
767
768 return ret;
769}
770
771static void destroy_sort_entry(struct tracing_map_sort_entry *entry)
772{
773 if (!entry)
774 return;
775
776 if (entry->elt_copied)
777 tracing_map_elt_free(entry->elt);
778
779 kfree(entry);
780}
781
782/**
783 * tracing_map_destroy_sort_entries - Destroy an array of sort entries
784 * @entries: The entries to destroy
785 * @n_entries: The number of entries in the array
786 *
787 * Destroy the elements returned by a tracing_map_sort_entries() call.
788 */
789void tracing_map_destroy_sort_entries(struct tracing_map_sort_entry **entries,
790 unsigned int n_entries)
791{
792 unsigned int i;
793
794 for (i = 0; i < n_entries; i++)
795 destroy_sort_entry(entries[i]);
796
797 vfree(entries);
798}
799
800static struct tracing_map_sort_entry *
801create_sort_entry(void *key, struct tracing_map_elt *elt)
802{
803 struct tracing_map_sort_entry *sort_entry;
804
805 sort_entry = kzalloc(sizeof(*sort_entry), GFP_KERNEL);
806 if (!sort_entry)
807 return NULL;
808
809 sort_entry->key = key;
810 sort_entry->elt = elt;
811
812 return sort_entry;
813}
814
815static struct tracing_map_elt *copy_elt(struct tracing_map_elt *elt)
816{
817 struct tracing_map_elt *dup_elt;
818 unsigned int i;
819
820 dup_elt = tracing_map_elt_alloc(elt->map);
821 if (IS_ERR(dup_elt))
822 return NULL;
823
824 if (elt->map->ops && elt->map->ops->elt_copy)
825 elt->map->ops->elt_copy(dup_elt, elt);
826
827 dup_elt->private_data = elt->private_data;
828 memcpy(dup_elt->key, elt->key, elt->map->key_size);
829
830 for (i = 0; i < elt->map->n_fields; i++) {
831 atomic64_set(&dup_elt->fields[i].sum,
832 atomic64_read(&elt->fields[i].sum));
833 dup_elt->fields[i].cmp_fn = elt->fields[i].cmp_fn;
834 }
835
836 return dup_elt;
837}
838
839static int merge_dup(struct tracing_map_sort_entry **sort_entries,
840 unsigned int target, unsigned int dup)
841{
842 struct tracing_map_elt *target_elt, *elt;
843 bool first_dup = (target - dup) == 1;
844 int i;
845
846 if (first_dup) {
847 elt = sort_entries[target]->elt;
848 target_elt = copy_elt(elt);
849 if (!target_elt)
850 return -ENOMEM;
851 sort_entries[target]->elt = target_elt;
852 sort_entries[target]->elt_copied = true;
853 } else
854 target_elt = sort_entries[target]->elt;
855
856 elt = sort_entries[dup]->elt;
857
858 for (i = 0; i < elt->map->n_fields; i++)
859 atomic64_add(atomic64_read(&elt->fields[i].sum),
860 &target_elt->fields[i].sum);
861
862 sort_entries[dup]->dup = true;
863
864 return 0;
865}
866
867static int merge_dups(struct tracing_map_sort_entry **sort_entries,
868 int n_entries, unsigned int key_size)
869{
870 unsigned int dups = 0, total_dups = 0;
871 int err, i, j;
872 void *key;
873
874 if (n_entries < 2)
875 return total_dups;
876
877 sort(sort_entries, n_entries, sizeof(struct tracing_map_sort_entry *),
878 (int (*)(const void *, const void *))cmp_entries_dup, NULL);
879
880 key = sort_entries[0]->key;
881 for (i = 1; i < n_entries; i++) {
882 if (!memcmp(sort_entries[i]->key, key, key_size)) {
883 dups++; total_dups++;
884 err = merge_dup(sort_entries, i - dups, i);
885 if (err)
886 return err;
887 continue;
888 }
889 key = sort_entries[i]->key;
890 dups = 0;
891 }
892
893 if (!total_dups)
894 return total_dups;
895
896 for (i = 0, j = 0; i < n_entries; i++) {
897 if (!sort_entries[i]->dup) {
898 sort_entries[j] = sort_entries[i];
899 if (j++ != i)
900 sort_entries[i] = NULL;
901 } else {
902 destroy_sort_entry(sort_entries[i]);
903 sort_entries[i] = NULL;
904 }
905 }
906
907 return total_dups;
908}
909
910static bool is_key(struct tracing_map *map, unsigned int field_idx)
911{
912 unsigned int i;
913
914 for (i = 0; i < map->n_keys; i++)
915 if (map->key_idx[i] == field_idx)
916 return true;
917 return false;
918}
919
920static void sort_secondary(struct tracing_map *map,
921 const struct tracing_map_sort_entry **entries,
922 unsigned int n_entries,
923 struct tracing_map_sort_key *primary_key,
924 struct tracing_map_sort_key *secondary_key)
925{
926 int (*primary_fn)(const struct tracing_map_sort_entry **,
927 const struct tracing_map_sort_entry **);
928 int (*secondary_fn)(const struct tracing_map_sort_entry **,
929 const struct tracing_map_sort_entry **);
930 unsigned i, start = 0, n_sub = 1;
931
932 if (is_key(map, primary_key->field_idx))
933 primary_fn = cmp_entries_key;
934 else
935 primary_fn = cmp_entries_sum;
936
937 if (is_key(map, secondary_key->field_idx))
938 secondary_fn = cmp_entries_key;
939 else
940 secondary_fn = cmp_entries_sum;
941
942 for (i = 0; i < n_entries - 1; i++) {
943 const struct tracing_map_sort_entry **a = &entries[i];
944 const struct tracing_map_sort_entry **b = &entries[i + 1];
945
946 if (primary_fn(a, b) == 0) {
947 n_sub++;
948 if (i < n_entries - 2)
949 continue;
950 }
951
952 if (n_sub < 2) {
953 start = i + 1;
954 n_sub = 1;
955 continue;
956 }
957
958 set_sort_key(map, secondary_key);
959 sort(&entries[start], n_sub,
960 sizeof(struct tracing_map_sort_entry *),
961 (int (*)(const void *, const void *))secondary_fn, NULL);
962 set_sort_key(map, primary_key);
963
964 start = i + 1;
965 n_sub = 1;
966 }
967}
968
969/**
970 * tracing_map_sort_entries - Sort the current set of tracing_map_elts in a map
971 * @map: The tracing_map
972 * @sort_key: The sort key to use for sorting
973 * @sort_entries: outval: pointer to allocated and sorted array of entries
974 *
975 * tracing_map_sort_entries() sorts the current set of entries in the
976 * map and returns the list of tracing_map_sort_entries containing
977 * them to the client in the sort_entries param. The client can
978 * access the struct tracing_map_elt element of interest directly as
979 * the 'elt' field of a returned struct tracing_map_sort_entry object.
980 *
981 * The sort_key has only two fields: idx and descending. 'idx' refers
982 * to the index of the field added via tracing_map_add_sum_field() or
983 * tracing_map_add_key_field() when the tracing_map was initialized.
984 * 'descending' is a flag that if set reverses the sort order, which
985 * by default is ascending.
986 *
987 * The client should not hold on to the returned array but should use
988 * it and call tracing_map_destroy_sort_entries() when done.
989 *
990 * Return: the number of sort_entries in the struct tracing_map_sort_entry
991 * array, negative on error
992 */
993int tracing_map_sort_entries(struct tracing_map *map,
994 struct tracing_map_sort_key *sort_keys,
995 unsigned int n_sort_keys,
996 struct tracing_map_sort_entry ***sort_entries)
997{
998 int (*cmp_entries_fn)(const struct tracing_map_sort_entry **,
999 const struct tracing_map_sort_entry **);
1000 struct tracing_map_sort_entry *sort_entry, **entries;
1001 int i, n_entries, ret;
1002
1003 entries = vmalloc(map->max_elts * sizeof(sort_entry));
1004 if (!entries)
1005 return -ENOMEM;
1006
1007 for (i = 0, n_entries = 0; i < map->map_size; i++) {
1008 struct tracing_map_entry *entry;
1009
1010 entry = TRACING_MAP_ENTRY(map->map, i);
1011
1012 if (!entry->key || !entry->val)
1013 continue;
1014
1015 entries[n_entries] = create_sort_entry(entry->val->key,
1016 entry->val);
1017 if (!entries[n_entries++]) {
1018 ret = -ENOMEM;
1019 goto free;
1020 }
1021 }
1022
1023 if (n_entries == 0) {
1024 ret = 0;
1025 goto free;
1026 }
1027
1028 if (n_entries == 1) {
1029 *sort_entries = entries;
1030 return 1;
1031 }
1032
1033 ret = merge_dups(entries, n_entries, map->key_size);
1034 if (ret < 0)
1035 goto free;
1036 n_entries -= ret;
1037
1038 if (is_key(map, sort_keys[0].field_idx))
1039 cmp_entries_fn = cmp_entries_key;
1040 else
1041 cmp_entries_fn = cmp_entries_sum;
1042
1043 set_sort_key(map, &sort_keys[0]);
1044
1045 sort(entries, n_entries, sizeof(struct tracing_map_sort_entry *),
1046 (int (*)(const void *, const void *))cmp_entries_fn, NULL);
1047
1048 if (n_sort_keys > 1)
1049 sort_secondary(map,
1050 (const struct tracing_map_sort_entry **)entries,
1051 n_entries,
1052 &sort_keys[0],
1053 &sort_keys[1]);
1054
1055 *sort_entries = entries;
1056
1057 return n_entries;
1058 free:
1059 tracing_map_destroy_sort_entries(entries, n_entries);
1060
1061 return ret;
1062}
diff --git a/kernel/trace/tracing_map.h b/kernel/trace/tracing_map.h
new file mode 100644
index 000000000000..618838f5f30a
--- /dev/null
+++ b/kernel/trace/tracing_map.h
@@ -0,0 +1,283 @@
1#ifndef __TRACING_MAP_H
2#define __TRACING_MAP_H
3
4#define TRACING_MAP_BITS_DEFAULT 11
5#define TRACING_MAP_BITS_MAX 17
6#define TRACING_MAP_BITS_MIN 7
7
8#define TRACING_MAP_KEYS_MAX 2
9#define TRACING_MAP_VALS_MAX 3
10#define TRACING_MAP_FIELDS_MAX (TRACING_MAP_KEYS_MAX + \
11 TRACING_MAP_VALS_MAX)
12#define TRACING_MAP_SORT_KEYS_MAX 2
13
14typedef int (*tracing_map_cmp_fn_t) (void *val_a, void *val_b);
15
16/*
17 * This is an overview of the tracing_map data structures and how they
18 * relate to the tracing_map API. The details of the algorithms
19 * aren't discussed here - this is just a general overview of the data
20 * structures and how they interact with the API.
21 *
22 * The central data structure of the tracing_map is an initially
23 * zeroed array of struct tracing_map_entry (stored in the map field
24 * of struct tracing_map). tracing_map_entry is a very simple data
25 * structure containing only two fields: a 32-bit unsigned 'key'
26 * variable and a pointer named 'val'. This array of struct
27 * tracing_map_entry is essentially a hash table which will be
28 * modified by a single function, tracing_map_insert(), but which can
29 * be traversed and read by a user at any time (though the user does
30 * this indirectly via an array of tracing_map_sort_entry - see the
31 * explanation of that data structure in the discussion of the
32 * sorting-related data structures below).
33 *
34 * The central function of the tracing_map API is
35 * tracing_map_insert(). tracing_map_insert() hashes the
36 * arbitrarily-sized key passed into it into a 32-bit unsigned key.
37 * It then uses this key, truncated to the array size, as an index
38 * into the array of tracing_map_entries. If the value of the 'key'
39 * field of the tracing_map_entry found at that location is 0, then
40 * that entry is considered to be free and can be claimed, by
41 * replacing the 0 in the 'key' field of the tracing_map_entry with
42 * the new 32-bit hashed key. Once claimed, that tracing_map_entry's
43 * 'val' field is then used to store a unique element which will be
44 * forever associated with that 32-bit hashed key in the
45 * tracing_map_entry.
46 *
47 * That unique element now in the tracing_map_entry's 'val' field is
48 * an instance of tracing_map_elt, where 'elt' in the latter part of
49 * that variable name is short for 'element'. The purpose of a
50 * tracing_map_elt is to hold values specific to the particular
51 * 32-bit hashed key it's assocated with. Things such as the unique
52 * set of aggregated sums associated with the 32-bit hashed key, along
53 * with a copy of the full key associated with the entry, and which
54 * was used to produce the 32-bit hashed key.
55 *
56 * When tracing_map_create() is called to create the tracing map, the
57 * user specifies (indirectly via the map_bits param, the details are
58 * unimportant for this discussion) the maximum number of elements
59 * that the map can hold (stored in the max_elts field of struct
60 * tracing_map). This is the maximum possible number of
61 * tracing_map_entries in the tracing_map_entry array which can be
62 * 'claimed' as described in the above discussion, and therefore is
63 * also the maximum number of tracing_map_elts that can be associated
64 * with the tracing_map_entry array in the tracing_map. Because of
65 * the way the insertion algorithm works, the size of the allocated
66 * tracing_map_entry array is always twice the maximum number of
67 * elements (2 * max_elts). This value is stored in the map_size
68 * field of struct tracing_map.
69 *
70 * Because tracing_map_insert() needs to work from any context,
71 * including from within the memory allocation functions themselves,
72 * both the tracing_map_entry array and a pool of max_elts
73 * tracing_map_elts are pre-allocated before any call is made to
74 * tracing_map_insert().
75 *
76 * The tracing_map_entry array is allocated as a single block by
77 * tracing_map_create().
78 *
79 * Because the tracing_map_elts are much larger objects and can't
80 * generally be allocated together as a single large array without
81 * failure, they're allocated individually, by tracing_map_init().
82 *
83 * The pool of tracing_map_elts are allocated by tracing_map_init()
84 * rather than by tracing_map_create() because at the time
85 * tracing_map_create() is called, there isn't enough information to
86 * create the tracing_map_elts. Specifically,the user first needs to
87 * tell the tracing_map implementation how many fields the
88 * tracing_map_elts contain, and which types of fields they are (key
89 * or sum). The user does this via the tracing_map_add_sum_field()
90 * and tracing_map_add_key_field() functions, following which the user
91 * calls tracing_map_init() to finish up the tracing map setup. The
92 * array holding the pointers which make up the pre-allocated pool of
93 * tracing_map_elts is allocated as a single block and is stored in
94 * the elts field of struct tracing_map.
95 *
96 * There is also a set of structures used for sorting that might
97 * benefit from some minimal explanation.
98 *
99 * struct tracing_map_sort_key is used to drive the sort at any given
100 * time. By 'any given time' we mean that a different
101 * tracing_map_sort_key will be used at different times depending on
102 * whether the sort currently being performed is a primary or a
103 * secondary sort.
104 *
105 * The sort key is very simple, consisting of the field index of the
106 * tracing_map_elt field to sort on (which the user saved when adding
107 * the field), and whether the sort should be done in an ascending or
108 * descending order.
109 *
110 * For the convenience of the sorting code, a tracing_map_sort_entry
111 * is created for each tracing_map_elt, again individually allocated
112 * to avoid failures that might be expected if allocated as a single
113 * large array of struct tracing_map_sort_entry.
114 * tracing_map_sort_entry instances are the objects expected by the
115 * various internal sorting functions, and are also what the user
116 * ultimately receives after calling tracing_map_sort_entries().
117 * Because it doesn't make sense for users to access an unordered and
118 * sparsely populated tracing_map directly, the
119 * tracing_map_sort_entries() function is provided so that users can
120 * retrieve a sorted list of all existing elements. In addition to
121 * the associated tracing_map_elt 'elt' field contained within the
122 * tracing_map_sort_entry, which is the object of interest to the
123 * user, tracing_map_sort_entry objects contain a number of additional
124 * fields which are used for caching and internal purposes and can
125 * safely be ignored.
126*/
127
128struct tracing_map_field {
129 tracing_map_cmp_fn_t cmp_fn;
130 union {
131 atomic64_t sum;
132 unsigned int offset;
133 };
134};
135
136struct tracing_map_elt {
137 struct tracing_map *map;
138 struct tracing_map_field *fields;
139 void *key;
140 void *private_data;
141};
142
143struct tracing_map_entry {
144 u32 key;
145 struct tracing_map_elt *val;
146};
147
148struct tracing_map_sort_key {
149 unsigned int field_idx;
150 bool descending;
151};
152
153struct tracing_map_sort_entry {
154 void *key;
155 struct tracing_map_elt *elt;
156 bool elt_copied;
157 bool dup;
158};
159
160struct tracing_map_array {
161 unsigned int entries_per_page;
162 unsigned int entry_size_shift;
163 unsigned int entry_shift;
164 unsigned int entry_mask;
165 unsigned int n_pages;
166 void **pages;
167};
168
169#define TRACING_MAP_ARRAY_ELT(array, idx) \
170 (array->pages[idx >> array->entry_shift] + \
171 ((idx & array->entry_mask) << array->entry_size_shift))
172
173#define TRACING_MAP_ENTRY(array, idx) \
174 ((struct tracing_map_entry *)TRACING_MAP_ARRAY_ELT(array, idx))
175
176#define TRACING_MAP_ELT(array, idx) \
177 ((struct tracing_map_elt **)TRACING_MAP_ARRAY_ELT(array, idx))
178
179struct tracing_map {
180 unsigned int key_size;
181 unsigned int map_bits;
182 unsigned int map_size;
183 unsigned int max_elts;
184 atomic_t next_elt;
185 struct tracing_map_array *elts;
186 struct tracing_map_array *map;
187 const struct tracing_map_ops *ops;
188 void *private_data;
189 struct tracing_map_field fields[TRACING_MAP_FIELDS_MAX];
190 unsigned int n_fields;
191 int key_idx[TRACING_MAP_KEYS_MAX];
192 unsigned int n_keys;
193 struct tracing_map_sort_key sort_key;
194 atomic64_t hits;
195 atomic64_t drops;
196};
197
198/**
199 * struct tracing_map_ops - callbacks for tracing_map
200 *
201 * The methods in this structure define callback functions for various
202 * operations on a tracing_map or objects related to a tracing_map.
203 *
204 * For a detailed description of tracing_map_elt objects please see
205 * the overview of tracing_map data structures at the beginning of
206 * this file.
207 *
208 * All the methods below are optional.
209 *
210 * @elt_alloc: When a tracing_map_elt is allocated, this function, if
211 * defined, will be called and gives clients the opportunity to
212 * allocate additional data and attach it to the element
213 * (tracing_map_elt->private_data is meant for that purpose).
214 * Element allocation occurs before tracing begins, when the
215 * tracing_map_init() call is made by client code.
216 *
217 * @elt_copy: At certain points in the lifetime of an element, it may
218 * need to be copied. The copy should include a copy of the
219 * client-allocated data, which can be copied into the 'to'
220 * element from the 'from' element.
221 *
222 * @elt_free: When a tracing_map_elt is freed, this function is called
223 * and allows client-allocated per-element data to be freed.
224 *
225 * @elt_clear: This callback allows per-element client-defined data to
226 * be cleared, if applicable.
227 *
228 * @elt_init: This callback allows per-element client-defined data to
229 * be initialized when used i.e. when the element is actually
230 * claimed by tracing_map_insert() in the context of the map
231 * insertion.
232 */
233struct tracing_map_ops {
234 int (*elt_alloc)(struct tracing_map_elt *elt);
235 void (*elt_copy)(struct tracing_map_elt *to,
236 struct tracing_map_elt *from);
237 void (*elt_free)(struct tracing_map_elt *elt);
238 void (*elt_clear)(struct tracing_map_elt *elt);
239 void (*elt_init)(struct tracing_map_elt *elt);
240};
241
242extern struct tracing_map *
243tracing_map_create(unsigned int map_bits,
244 unsigned int key_size,
245 const struct tracing_map_ops *ops,
246 void *private_data);
247extern int tracing_map_init(struct tracing_map *map);
248
249extern int tracing_map_add_sum_field(struct tracing_map *map);
250extern int tracing_map_add_key_field(struct tracing_map *map,
251 unsigned int offset,
252 tracing_map_cmp_fn_t cmp_fn);
253
254extern void tracing_map_destroy(struct tracing_map *map);
255extern void tracing_map_clear(struct tracing_map *map);
256
257extern struct tracing_map_elt *
258tracing_map_insert(struct tracing_map *map, void *key);
259extern struct tracing_map_elt *
260tracing_map_lookup(struct tracing_map *map, void *key);
261
262extern tracing_map_cmp_fn_t tracing_map_cmp_num(int field_size,
263 int field_is_signed);
264extern int tracing_map_cmp_string(void *val_a, void *val_b);
265extern int tracing_map_cmp_none(void *val_a, void *val_b);
266
267extern void tracing_map_update_sum(struct tracing_map_elt *elt,
268 unsigned int i, u64 n);
269extern u64 tracing_map_read_sum(struct tracing_map_elt *elt, unsigned int i);
270extern void tracing_map_set_field_descr(struct tracing_map *map,
271 unsigned int i,
272 unsigned int key_offset,
273 tracing_map_cmp_fn_t cmp_fn);
274extern int
275tracing_map_sort_entries(struct tracing_map *map,
276 struct tracing_map_sort_key *sort_keys,
277 unsigned int n_sort_keys,
278 struct tracing_map_sort_entry ***sort_entries);
279
280extern void
281tracing_map_destroy_sort_entries(struct tracing_map_sort_entry **entries,
282 unsigned int n_entries);
283#endif /* __TRACING_MAP_H */
diff --git a/tools/testing/selftests/ftrace/test.d/functions b/tools/testing/selftests/ftrace/test.d/functions
index 5d8cd06d920f..c37262f6c269 100644
--- a/tools/testing/selftests/ftrace/test.d/functions
+++ b/tools/testing/selftests/ftrace/test.d/functions
@@ -14,3 +14,12 @@ enable_tracing() { # start trace recording
14reset_tracer() { # reset the current tracer 14reset_tracer() { # reset the current tracer
15 echo nop > current_tracer 15 echo nop > current_tracer
16} 16}
17
18reset_trigger() { # reset all current setting triggers
19 grep -v ^# events/*/*/trigger |
20 while read line; do
21 cmd=`echo $line | cut -f2- -d: | cut -f1 -d" "`
22 echo "!$cmd" > `echo $line | cut -f1 -d:`
23 done
24}
25
diff --git a/tools/testing/selftests/ftrace/test.d/trigger/trigger-eventonoff.tc b/tools/testing/selftests/ftrace/test.d/trigger/trigger-eventonoff.tc
new file mode 100644
index 000000000000..1a9445021bf1
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/trigger/trigger-eventonoff.tc
@@ -0,0 +1,64 @@
1#!/bin/sh
2# description: event trigger - test event enable/disable trigger
3
4do_reset() {
5 reset_trigger
6 echo > set_event
7 clear_trace
8}
9
10fail() { #msg
11 do_reset
12 echo $1
13 exit $FAIL
14}
15
16if [ ! -f set_event -o ! -d events/sched ]; then
17 echo "event tracing is not supported"
18 exit_unsupported
19fi
20
21if [ ! -f events/sched/sched_process_fork/trigger ]; then
22 echo "event trigger is not supported"
23 exit_unsupported
24fi
25
26reset_tracer
27do_reset
28
29FEATURE=`grep enable_event events/sched/sched_process_fork/trigger`
30if [ -z "$FEATURE" ]; then
31 echo "event enable/disable trigger is not supported"
32 exit_unsupported
33fi
34
35echo "Test enable_event trigger"
36echo 0 > events/sched/sched_switch/enable
37echo 'enable_event:sched:sched_switch' > events/sched/sched_process_fork/trigger
38( echo "forked")
39if [ `cat events/sched/sched_switch/enable` != '1*' ]; then
40 fail "enable_event trigger on sched_process_fork did not work"
41fi
42
43reset_trigger
44
45echo "Test disable_event trigger"
46echo 1 > events/sched/sched_switch/enable
47echo 'disable_event:sched:sched_switch' > events/sched/sched_process_fork/trigger
48( echo "forked")
49if [ `cat events/sched/sched_switch/enable` != '0*' ]; then
50 fail "disable_event trigger on sched_process_fork did not work"
51fi
52
53reset_trigger
54
55echo "Test semantic error for event enable/disable trigger"
56! echo 'enable_event:nogroup:noevent' > events/sched/sched_process_fork/trigger
57! echo 'disable_event+1' > events/sched/sched_process_fork/trigger
58echo 'enable_event:sched:sched_switch' > events/sched/sched_process_fork/trigger
59! echo 'enable_event:sched:sched_switch' > events/sched/sched_process_fork/trigger
60! echo 'disable_event:sched:sched_switch' > events/sched/sched_process_fork/trigger
61
62do_reset
63
64exit 0
diff --git a/tools/testing/selftests/ftrace/test.d/trigger/trigger-filter.tc b/tools/testing/selftests/ftrace/test.d/trigger/trigger-filter.tc
new file mode 100644
index 000000000000..514e466e198b
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/trigger/trigger-filter.tc
@@ -0,0 +1,59 @@
1#!/bin/sh
2# description: event trigger - test trigger filter
3
4do_reset() {
5 reset_trigger
6 echo > set_event
7 clear_trace
8}
9
10fail() { #msg
11 do_reset
12 echo $1
13 exit $FAIL
14}
15
16if [ ! -f set_event -o ! -d events/sched ]; then
17 echo "event tracing is not supported"
18 exit_unsupported
19fi
20
21if [ ! -f events/sched/sched_process_fork/trigger ]; then
22 echo "event trigger is not supported"
23 exit_unsupported
24fi
25
26reset_tracer
27do_reset
28
29echo "Test trigger filter"
30echo 1 > tracing_on
31echo 'traceoff if child_pid == 0' > events/sched/sched_process_fork/trigger
32( echo "forked")
33if [ `cat tracing_on` -ne 1 ]; then
34 fail "traceoff trigger on sched_process_fork did not work"
35fi
36
37reset_trigger
38
39echo "Test semantic error for trigger filter"
40! echo 'traceoff if a' > events/sched/sched_process_fork/trigger
41! echo 'traceoff if common_pid=0' > events/sched/sched_process_fork/trigger
42! echo 'traceoff if common_pid==b' > events/sched/sched_process_fork/trigger
43echo 'traceoff if common_pid == 0' > events/sched/sched_process_fork/trigger
44echo '!traceoff' > events/sched/sched_process_fork/trigger
45! echo 'traceoff if common_pid == child_pid' > events/sched/sched_process_fork/trigger
46echo 'traceoff if common_pid <= 0' > events/sched/sched_process_fork/trigger
47echo '!traceoff' > events/sched/sched_process_fork/trigger
48echo 'traceoff if common_pid >= 0' > events/sched/sched_process_fork/trigger
49echo '!traceoff' > events/sched/sched_process_fork/trigger
50echo 'traceoff if parent_pid >= 0 && child_pid >= 0' > events/sched/sched_process_fork/trigger
51echo '!traceoff' > events/sched/sched_process_fork/trigger
52echo 'traceoff if parent_pid >= 0 || child_pid >= 0' > events/sched/sched_process_fork/trigger
53echo '!traceoff' > events/sched/sched_process_fork/trigger
54
55
56
57do_reset
58
59exit 0
diff --git a/tools/testing/selftests/ftrace/test.d/trigger/trigger-hist-mod.tc b/tools/testing/selftests/ftrace/test.d/trigger/trigger-hist-mod.tc
new file mode 100644
index 000000000000..c2b61c4fda11
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/trigger/trigger-hist-mod.tc
@@ -0,0 +1,75 @@
1#!/bin/sh
2# description: event trigger - test histogram modifiers
3
4do_reset() {
5 reset_trigger
6 echo > set_event
7 clear_trace
8}
9
10fail() { #msg
11 do_reset
12 echo $1
13 exit $FAIL
14}
15
16if [ ! -f set_event -o ! -d events/sched ]; then
17 echo "event tracing is not supported"
18 exit_unsupported
19fi
20
21if [ ! -f events/sched/sched_process_fork/trigger ]; then
22 echo "event trigger is not supported"
23 exit_unsupported
24fi
25
26reset_tracer
27do_reset
28
29FEATURE=`grep hist events/sched/sched_process_fork/trigger`
30if [ -z "$FEATURE" ]; then
31 echo "hist trigger is not supported"
32 exit_unsupported
33fi
34
35echo "Test histogram with execname modifier"
36
37echo 'hist:keys=common_pid.execname' > events/sched/sched_process_fork/trigger
38for i in `seq 1 10` ; do ( echo "forked" > /dev/null); done
39COMM=`cat /proc/$$/comm`
40grep "common_pid: $COMM" events/sched/sched_process_fork/hist > /dev/null || \
41 fail "execname modifier on sched_process_fork did not work"
42
43reset_trigger
44
45echo "Test histogram with hex modifier"
46
47echo 'hist:keys=parent_pid.hex' > events/sched/sched_process_fork/trigger
48for i in `seq 1 10` ; do ( echo "forked" > /dev/null); done
49# Note that $$ is the parent pid. $PID is current PID.
50HEX=`printf %x $PID`
51grep "parent_pid: $HEX" events/sched/sched_process_fork/hist > /dev/null || \
52 fail "hex modifier on sched_process_fork did not work"
53
54reset_trigger
55
56echo "Test histogram with syscall modifier"
57
58echo 'hist:keys=id.syscall' > events/raw_syscalls/sys_exit/trigger
59for i in `seq 1 10` ; do ( echo "forked" > /dev/null); done
60grep "id: sys_" events/raw_syscalls/sys_exit/hist > /dev/null || \
61 fail "syscall modifier on raw_syscalls/sys_exit did not work"
62
63
64reset_trigger
65
66echo "Test histgram with log2 modifier"
67
68echo 'hist:keys=bytes_req.log2' > events/kmem/kmalloc/trigger
69for i in `seq 1 10` ; do ( echo "forked" > /dev/null); done
70grep 'bytes_req: ~ 2^[0-9]*' events/kmem/kmalloc/hist > /dev/null || \
71 fail "log2 modifier on kmem/kmalloc did not work"
72
73do_reset
74
75exit 0
diff --git a/tools/testing/selftests/ftrace/test.d/trigger/trigger-hist.tc b/tools/testing/selftests/ftrace/test.d/trigger/trigger-hist.tc
new file mode 100644
index 000000000000..b2902d42a537
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/trigger/trigger-hist.tc
@@ -0,0 +1,83 @@
1#!/bin/sh
2# description: event trigger - test histogram trigger
3
4do_reset() {
5 reset_trigger
6 echo > set_event
7 clear_trace
8}
9
10fail() { #msg
11 do_reset
12 echo $1
13 exit $FAIL
14}
15
16if [ ! -f set_event -o ! -d events/sched ]; then
17 echo "event tracing is not supported"
18 exit_unsupported
19fi
20
21if [ ! -f events/sched/sched_process_fork/trigger ]; then
22 echo "event trigger is not supported"
23 exit_unsupported
24fi
25
26reset_tracer
27do_reset
28
29FEATURE=`grep hist events/sched/sched_process_fork/trigger`
30if [ -z "$FEATURE" ]; then
31 echo "hist trigger is not supported"
32 exit_unsupported
33fi
34
35echo "Test histogram basic tigger"
36
37echo 'hist:keys=parent_pid:vals=child_pid' > events/sched/sched_process_fork/trigger
38for i in `seq 1 10` ; do ( echo "forked" > /dev/null); done
39grep parent_pid events/sched/sched_process_fork/hist > /dev/null || \
40 fail "hist trigger on sched_process_fork did not work"
41grep child events/sched/sched_process_fork/hist > /dev/null || \
42 fail "hist trigger on sched_process_fork did not work"
43
44reset_trigger
45
46echo "Test histogram with compound keys"
47
48echo 'hist:keys=parent_pid,child_pid' > events/sched/sched_process_fork/trigger
49for i in `seq 1 10` ; do ( echo "forked" > /dev/null); done
50grep '^{ parent_pid:.*, child_pid:.*}' events/sched/sched_process_fork/hist > /dev/null || \
51 fail "compound keys on sched_process_fork did not work"
52
53reset_trigger
54
55echo "Test histogram with string key"
56
57echo 'hist:keys=parent_comm' > events/sched/sched_process_fork/trigger
58for i in `seq 1 10` ; do ( echo "forked" > /dev/null); done
59COMM=`cat /proc/$$/comm`
60grep "parent_comm: $COMM" events/sched/sched_process_fork/hist > /dev/null || \
61 fail "string key on sched_process_fork did not work"
62
63reset_trigger
64
65echo "Test histogram with sort key"
66
67echo 'hist:keys=parent_pid,child_pid:sort=child_pid.ascending' > events/sched/sched_process_fork/trigger
68for i in `seq 1 10` ; do ( echo "forked" > /dev/null); done
69
70check_inc() {
71 while [ $# -gt 1 ]; do
72 [ $1 -gt $2 ] && return 1
73 shift 1
74 done
75 return 0
76}
77check_inc `grep -o "child_pid:[[:space:]]*[[:digit:]]*" \
78 events/sched/sched_process_fork/hist | cut -d: -f2 ` ||
79 fail "sort param on sched_process_fork did not work"
80
81do_reset
82
83exit 0
diff --git a/tools/testing/selftests/ftrace/test.d/trigger/trigger-multihist.tc b/tools/testing/selftests/ftrace/test.d/trigger/trigger-multihist.tc
new file mode 100644
index 000000000000..03c4a46561fc
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/trigger/trigger-multihist.tc
@@ -0,0 +1,73 @@
1#!/bin/sh
2# description: event trigger - test multiple histogram triggers
3
4do_reset() {
5 reset_trigger
6 echo > set_event
7 clear_trace
8}
9
10fail() { #msg
11 do_reset
12 echo $1
13 exit $FAIL
14}
15
16if [ ! -f set_event -o ! -d events/sched ]; then
17 echo "event tracing is not supported"
18 exit_unsupported
19fi
20
21if [ ! -f events/sched/sched_process_fork/trigger ]; then
22 echo "event trigger is not supported"
23 exit_unsupported
24fi
25
26reset_tracer
27do_reset
28
29FEATURE=`grep hist events/sched/sched_process_fork/trigger`
30if [ -z "$FEATURE" ]; then
31 echo "hist trigger is not supported"
32 exit_unsupported
33fi
34
35reset_trigger
36
37echo "Test histogram multiple tiggers"
38
39echo 'hist:keys=parent_pid:vals=child_pid' > events/sched/sched_process_fork/trigger
40echo 'hist:keys=parent_comm:vals=child_pid' >> events/sched/sched_process_fork/trigger
41for i in `seq 1 10` ; do ( echo "forked" > /dev/null); done
42grep parent_pid events/sched/sched_process_fork/hist > /dev/null || \
43 fail "hist trigger on sched_process_fork did not work"
44grep child events/sched/sched_process_fork/hist > /dev/null || \
45 fail "hist trigger on sched_process_fork did not work"
46COMM=`cat /proc/$$/comm`
47grep "parent_comm: $COMM" events/sched/sched_process_fork/hist > /dev/null || \
48 fail "string key on sched_process_fork did not work"
49
50reset_trigger
51
52echo "Test histogram with its name"
53
54echo 'hist:name=test_hist:keys=common_pid' > events/sched/sched_process_fork/trigger
55for i in `seq 1 10` ; do ( echo "forked" > /dev/null); done
56grep test_hist events/sched/sched_process_fork/hist > /dev/null || \
57 fail "named event on sched_process_fork did not work"
58
59echo "Test same named histogram on different events"
60
61echo 'hist:name=test_hist:keys=common_pid' > events/sched/sched_process_exit/trigger
62for i in `seq 1 10` ; do ( echo "forked" > /dev/null); done
63grep test_hist events/sched/sched_process_exit/hist > /dev/null || \
64 fail "named event on sched_process_fork did not work"
65
66diffs=`diff events/sched/sched_process_exit/hist events/sched/sched_process_fork/hist | wc -l`
67test $diffs -eq 0 || fail "Same name histograms are not same"
68
69reset_trigger
70
71do_reset
72
73exit 0
diff --git a/tools/testing/selftests/ftrace/test.d/trigger/trigger-snapshot.tc b/tools/testing/selftests/ftrace/test.d/trigger/trigger-snapshot.tc
new file mode 100644
index 000000000000..f84b80d551a2
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/trigger/trigger-snapshot.tc
@@ -0,0 +1,56 @@
1#!/bin/sh
2# description: event trigger - test snapshot-trigger
3
4do_reset() {
5 reset_trigger
6 echo > set_event
7 clear_trace
8}
9
10fail() { #msg
11 do_reset
12 echo $1
13 exit $FAIL
14}
15
16if [ ! -f set_event -o ! -d events/sched ]; then
17 echo "event tracing is not supported"
18 exit_unsupported
19fi
20
21if [ ! -f events/sched/sched_process_fork/trigger ]; then
22 echo "event trigger is not supported"
23 exit_unsupported
24fi
25
26reset_tracer
27do_reset
28
29FEATURE=`grep snapshot events/sched/sched_process_fork/trigger`
30if [ -z "$FEATURE" ]; then
31 echo "snapshot trigger is not supported"
32 exit_unsupported
33fi
34
35echo "Test snapshot tigger"
36echo 0 > snapshot
37echo 1 > events/sched/sched_process_fork/enable
38( echo "forked")
39echo 'snapshot:1' > events/sched/sched_process_fork/trigger
40( echo "forked")
41grep sched_process_fork snapshot > /dev/null || \
42 fail "snapshot trigger on sched_process_fork did not work"
43
44reset_trigger
45echo 0 > snapshot
46echo 0 > events/sched/sched_process_fork/enable
47
48echo "Test snapshot semantic errors"
49
50! echo "snapshot+1" > events/sched/sched_process_fork/trigger
51echo "snapshot" > events/sched/sched_process_fork/trigger
52! echo "snapshot" > events/sched/sched_process_fork/trigger
53
54do_reset
55
56exit 0
diff --git a/tools/testing/selftests/ftrace/test.d/trigger/trigger-stacktrace.tc b/tools/testing/selftests/ftrace/test.d/trigger/trigger-stacktrace.tc
new file mode 100644
index 000000000000..9fa23b085def
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/trigger/trigger-stacktrace.tc
@@ -0,0 +1,53 @@
1#!/bin/sh
2# description: event trigger - test stacktrace-trigger
3
4do_reset() {
5 reset_trigger
6 echo > set_event
7 clear_trace
8}
9
10fail() { #msg
11 do_reset
12 echo $1
13 exit $FAIL
14}
15
16if [ ! -f set_event -o ! -d events/sched ]; then
17 echo "event tracing is not supported"
18 exit_unsupported
19fi
20
21if [ ! -f events/sched/sched_process_fork/trigger ]; then
22 echo "event trigger is not supported"
23 exit_unsupported
24fi
25
26reset_tracer
27do_reset
28
29FEATURE=`grep stacktrace events/sched/sched_process_fork/trigger`
30if [ -z "$FEATURE" ]; then
31 echo "stacktrace trigger is not supported"
32 exit_unsupported
33fi
34
35echo "Test stacktrace tigger"
36echo 0 > trace
37echo 0 > options/stacktrace
38echo 'stacktrace' > events/sched/sched_process_fork/trigger
39( echo "forked")
40grep "<stack trace>" trace > /dev/null || \
41 fail "stacktrace trigger on sched_process_fork did not work"
42
43reset_trigger
44
45echo "Test stacktrace semantic errors"
46
47! echo "stacktrace:foo" > events/sched/sched_process_fork/trigger
48echo "stacktrace" > events/sched/sched_process_fork/trigger
49! echo "stacktrace" > events/sched/sched_process_fork/trigger
50
51do_reset
52
53exit 0
diff --git a/tools/testing/selftests/ftrace/test.d/trigger/trigger-traceonoff.tc b/tools/testing/selftests/ftrace/test.d/trigger/trigger-traceonoff.tc
new file mode 100644
index 000000000000..87648e5f987c
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/trigger/trigger-traceonoff.tc
@@ -0,0 +1,58 @@
1#!/bin/sh
2# description: event trigger - test traceon/off trigger
3
4do_reset() {
5 reset_trigger
6 echo > set_event
7 clear_trace
8}
9
10fail() { #msg
11 do_reset
12 echo $1
13 exit $FAIL
14}
15
16if [ ! -f set_event -o ! -d events/sched ]; then
17 echo "event tracing is not supported"
18 exit_unsupported
19fi
20
21if [ ! -f events/sched/sched_process_fork/trigger ]; then
22 echo "event trigger is not supported"
23 exit_unsupported
24fi
25
26reset_tracer
27do_reset
28
29echo "Test traceoff trigger"
30echo 1 > tracing_on
31echo 'traceoff' > events/sched/sched_process_fork/trigger
32( echo "forked")
33if [ `cat tracing_on` -ne 0 ]; then
34 fail "traceoff trigger on sched_process_fork did not work"
35fi
36
37reset_trigger
38
39echo "Test traceon trigger"
40echo 0 > tracing_on
41echo 'traceon' > events/sched/sched_process_fork/trigger
42( echo "forked")
43if [ `cat tracing_on` -ne 1 ]; then
44 fail "traceoff trigger on sched_process_fork did not work"
45fi
46
47reset_trigger
48
49echo "Test semantic error for traceoff/on trigger"
50! echo 'traceoff:badparam' > events/sched/sched_process_fork/trigger
51! echo 'traceoff+0' > events/sched/sched_process_fork/trigger
52echo 'traceon' > events/sched/sched_process_fork/trigger
53! echo 'traceon' > events/sched/sched_process_fork/trigger
54! echo 'traceoff' > events/sched/sched_process_fork/trigger
55
56do_reset
57
58exit 0