aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
-rw-r--r--Documentation/ABI/testing/sysfs-class-powercap152
-rw-r--r--Documentation/power/powercap/powercap.txt236
-rw-r--r--arch/x86/include/asm/msr.h22
-rw-r--r--arch/x86/lib/msr-smp.c62
-rw-r--r--drivers/Kconfig2
-rw-r--r--drivers/Makefile1
-rw-r--r--drivers/powercap/Kconfig32
-rw-r--r--drivers/powercap/Makefile2
-rw-r--r--drivers/powercap/intel_rapl.c1395
-rw-r--r--drivers/powercap/powercap_sys.c685
-rw-r--r--include/linux/bitops.h3
-rw-r--r--include/linux/powercap.h325
12 files changed, 2917 insertions, 0 deletions
diff --git a/Documentation/ABI/testing/sysfs-class-powercap b/Documentation/ABI/testing/sysfs-class-powercap
new file mode 100644
index 000000000000..db3b3ff70d84
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-class-powercap
@@ -0,0 +1,152 @@
1What: /sys/class/powercap/
2Date: September 2013
3KernelVersion: 3.13
4Contact: linux-pm@vger.kernel.org
5Description:
6 The powercap/ class sub directory belongs to the power cap
7 subsystem. Refer to
8 Documentation/power/powercap/powercap.txt for details.
9
10What: /sys/class/powercap/<control type>
11Date: September 2013
12KernelVersion: 3.13
13Contact: linux-pm@vger.kernel.org
14Description:
15 A <control type> is a unique name under /sys/class/powercap.
16 Here <control type> determines how the power is going to be
17 controlled. A <control type> can contain multiple power zones.
18
19What: /sys/class/powercap/<control type>/enabled
20Date: September 2013
21KernelVersion: 3.13
22Contact: linux-pm@vger.kernel.org
23Description:
24 This allows to enable/disable power capping for a "control type".
25 This status affects every power zone using this "control_type.
26
27What: /sys/class/powercap/<control type>/<power zone>
28Date: September 2013
29KernelVersion: 3.13
30Contact: linux-pm@vger.kernel.org
31Description:
32 A power zone is a single or a collection of devices, which can
33 be independently monitored and controlled. A power zone sysfs
34 entry is qualified with the name of the <control type>.
35 E.g. intel-rapl:0:1:1.
36
37What: /sys/class/powercap/<control type>/<power zone>/<child power zone>
38Date: September 2013
39KernelVersion: 3.13
40Contact: linux-pm@vger.kernel.org
41Description:
42 Power zones may be organized in a hierarchy in which child
43 power zones provide monitoring and control for a subset of
44 devices under the parent. For example, if there is a parent
45 power zone for a whole CPU package, each CPU core in it can
46 be a child power zone.
47
48What: /sys/class/powercap/.../<power zone>/name
49Date: September 2013
50KernelVersion: 3.13
51Contact: linux-pm@vger.kernel.org
52Description:
53 Specifies the name of this power zone.
54
55What: /sys/class/powercap/.../<power zone>/energy_uj
56Date: September 2013
57KernelVersion: 3.13
58Contact: linux-pm@vger.kernel.org
59Description:
60 Current energy counter in micro-joules. Write "0" to reset.
61 If the counter can not be reset, then this attribute is
62 read-only.
63
64What: /sys/class/powercap/.../<power zone>/max_energy_range_uj
65Date: September 2013
66KernelVersion: 3.13
67Contact: linux-pm@vger.kernel.org
68Description:
69 Range of the above energy counter in micro-joules.
70
71
72What: /sys/class/powercap/.../<power zone>/power_uw
73Date: September 2013
74KernelVersion: 3.13
75Contact: linux-pm@vger.kernel.org
76Description:
77 Current power in micro-watts.
78
79What: /sys/class/powercap/.../<power zone>/max_power_range_uw
80Date: September 2013
81KernelVersion: 3.13
82Contact: linux-pm@vger.kernel.org
83Description:
84 Range of the above power value in micro-watts.
85
86What: /sys/class/powercap/.../<power zone>/constraint_X_name
87Date: September 2013
88KernelVersion: 3.13
89Contact: linux-pm@vger.kernel.org
90Description:
91 Each power zone can define one or more constraints. Each
92 constraint can have an optional name. Here "X" can have values
93 from 0 to max integer.
94
95What: /sys/class/powercap/.../<power zone>/constraint_X_power_limit_uw
96Date: September 2013
97KernelVersion: 3.13
98Contact: linux-pm@vger.kernel.org
99Description:
100 Power limit in micro-watts should be applicable for
101 the time window specified by "constraint_X_time_window_us".
102 Here "X" can have values from 0 to max integer.
103
104What: /sys/class/powercap/.../<power zone>/constraint_X_time_window_us
105Date: September 2013
106KernelVersion: 3.13
107Contact: linux-pm@vger.kernel.org
108Description:
109 Time window in micro seconds. This is used along with
110 constraint_X_power_limit_uw to define a power constraint.
111 Here "X" can have values from 0 to max integer.
112
113
114What: /sys/class/powercap/<control type>/.../constraint_X_max_power_uw
115Date: September 2013
116KernelVersion: 3.13
117Contact: linux-pm@vger.kernel.org
118Description:
119 Maximum allowed power in micro watts for this constraint.
120 Here "X" can have values from 0 to max integer.
121
122What: /sys/class/powercap/<control type>/.../constraint_X_min_power_uw
123Date: September 2013
124KernelVersion: 3.13
125Contact: linux-pm@vger.kernel.org
126Description:
127 Minimum allowed power in micro watts for this constraint.
128 Here "X" can have values from 0 to max integer.
129
130What: /sys/class/powercap/.../<power zone>/constraint_X_max_time_window_us
131Date: September 2013
132KernelVersion: 3.13
133Contact: linux-pm@vger.kernel.org
134Description:
135 Maximum allowed time window in micro seconds for this
136 constraint. Here "X" can have values from 0 to max integer.
137
138What: /sys/class/powercap/.../<power zone>/constraint_X_min_time_window_us
139Date: September 2013
140KernelVersion: 3.13
141Contact: linux-pm@vger.kernel.org
142Description:
143 Minimum allowed time window in micro seconds for this
144 constraint. Here "X" can have values from 0 to max integer.
145
146What: /sys/class/powercap/.../<power zone>/enabled
147Date: September 2013
148KernelVersion: 3.13
149Contact: linux-pm@vger.kernel.org
150Description
151 This allows to enable/disable power capping at power zone level.
152 This applies to current power zone and its children.
diff --git a/Documentation/power/powercap/powercap.txt b/Documentation/power/powercap/powercap.txt
new file mode 100644
index 000000000000..1e6ef164e07a
--- /dev/null
+++ b/Documentation/power/powercap/powercap.txt
@@ -0,0 +1,236 @@
1Power Capping Framework
2==================================
3
4The power capping framework provides a consistent interface between the kernel
5and the user space that allows power capping drivers to expose the settings to
6user space in a uniform way.
7
8Terminology
9=========================
10The framework exposes power capping devices to user space via sysfs in the
11form of a tree of objects. The objects at the root level of the tree represent
12'control types', which correspond to different methods of power capping. For
13example, the intel-rapl control type represents the Intel "Running Average
14Power Limit" (RAPL) technology, whereas the 'idle-injection' control type
15corresponds to the use of idle injection for controlling power.
16
17Power zones represent different parts of the system, which can be controlled and
18monitored using the power capping method determined by the control type the
19given zone belongs to. They each contain attributes for monitoring power, as
20well as controls represented in the form of power constraints. If the parts of
21the system represented by different power zones are hierarchical (that is, one
22bigger part consists of multiple smaller parts that each have their own power
23controls), those power zones may also be organized in a hierarchy with one
24parent power zone containing multiple subzones and so on to reflect the power
25control topology of the system. In that case, it is possible to apply power
26capping to a set of devices together using the parent power zone and if more
27fine grained control is required, it can be applied through the subzones.
28
29
30Example sysfs interface tree:
31
32/sys/devices/virtual/powercap
33??? intel-rapl
34 ??? intel-rapl:0
35 ?Ā Ā  ??? constraint_0_name
36 ?Ā Ā  ??? constraint_0_power_limit_uw
37 ?Ā Ā  ??? constraint_0_time_window_us
38 ?Ā Ā  ??? constraint_1_name
39 ?Ā Ā  ??? constraint_1_power_limit_uw
40 ?Ā Ā  ??? constraint_1_time_window_us
41 ?Ā Ā  ??? device -> ../../intel-rapl
42 ?Ā Ā  ??? energy_uj
43 ?Ā Ā  ??? intel-rapl:0:0
44 ?Ā Ā  ?Ā Ā  ??? constraint_0_name
45 ?Ā Ā  ?Ā Ā  ??? constraint_0_power_limit_uw
46 ?Ā Ā  ?Ā Ā  ??? constraint_0_time_window_us
47 ?Ā Ā  ?Ā Ā  ??? constraint_1_name
48 ?Ā Ā  ?Ā Ā  ??? constraint_1_power_limit_uw
49 ?Ā Ā  ?Ā Ā  ??? constraint_1_time_window_us
50 ?Ā Ā  ?Ā Ā  ??? device -> ../../intel-rapl:0
51 ?Ā Ā  ?Ā Ā  ??? energy_uj
52 ?Ā Ā  ?Ā Ā  ??? max_energy_range_uj
53 ?Ā Ā  ?Ā Ā  ??? name
54 ?Ā Ā  ?Ā Ā  ??? enabled
55 ?Ā Ā  ?Ā Ā  ??? power
56 ?Ā Ā  ?Ā Ā  ?Ā Ā  ??? async
57 ?Ā Ā  ?Ā Ā  ?Ā Ā  []
58 ?Ā Ā  ?Ā Ā  ??? subsystem -> ../../../../../../class/power_cap
59 ?Ā Ā  ?Ā Ā  ??? uevent
60 ?Ā Ā  ??? intel-rapl:0:1
61 ?Ā Ā  ?Ā Ā  ??? constraint_0_name
62 ?Ā Ā  ?Ā Ā  ??? constraint_0_power_limit_uw
63 ?Ā Ā  ?Ā Ā  ??? constraint_0_time_window_us
64 ?Ā Ā  ?Ā Ā  ??? constraint_1_name
65 ?Ā Ā  ?Ā Ā  ??? constraint_1_power_limit_uw
66 ?Ā Ā  ?Ā Ā  ??? constraint_1_time_window_us
67 ?Ā Ā  ?Ā Ā  ??? device -> ../../intel-rapl:0
68 ?Ā Ā  ?Ā Ā  ??? energy_uj
69 ?Ā Ā  ?Ā Ā  ??? max_energy_range_uj
70 ?Ā Ā  ?Ā Ā  ??? name
71 ?Ā Ā  ?Ā Ā  ??? enabled
72 ?Ā Ā  ?Ā Ā  ??? power
73 ?Ā Ā  ?Ā Ā  ?Ā Ā  ??? async
74 ?Ā Ā  ?Ā Ā  ?Ā Ā  []
75 ?Ā Ā  ?Ā Ā  ??? subsystem -> ../../../../../../class/power_cap
76 ?Ā Ā  ?Ā Ā  ??? uevent
77 ?Ā Ā  ??? max_energy_range_uj
78 ?Ā Ā  ??? max_power_range_uw
79 ?Ā Ā  ??? name
80 ?Ā Ā  ??? enabled
81 ?Ā Ā  ??? power
82 ?Ā Ā  ?Ā Ā  ??? async
83 ?Ā Ā  ?Ā Ā  []
84 ?Ā Ā  ??? subsystem -> ../../../../../class/power_cap
85 ?Ā Ā  ??? enabled
86 ?Ā Ā  ??? uevent
87 ??? intel-rapl:1
88 ?Ā Ā  ??? constraint_0_name
89 ?Ā Ā  ??? constraint_0_power_limit_uw
90 ?Ā Ā  ??? constraint_0_time_window_us
91 ?Ā Ā  ??? constraint_1_name
92 ?Ā Ā  ??? constraint_1_power_limit_uw
93 ?Ā Ā  ??? constraint_1_time_window_us
94 ?Ā Ā  ??? device -> ../../intel-rapl
95 ?Ā Ā  ??? energy_uj
96 ?Ā Ā  ??? intel-rapl:1:0
97 ?Ā Ā  ?Ā Ā  ??? constraint_0_name
98 ?Ā Ā  ?Ā Ā  ??? constraint_0_power_limit_uw
99 ?Ā Ā  ?Ā Ā  ??? constraint_0_time_window_us
100 ?Ā Ā  ?Ā Ā  ??? constraint_1_name
101 ?Ā Ā  ?Ā Ā  ??? constraint_1_power_limit_uw
102 ?Ā Ā  ?Ā Ā  ??? constraint_1_time_window_us
103 ?Ā Ā  ?Ā Ā  ??? device -> ../../intel-rapl:1
104 ?Ā Ā  ?Ā Ā  ??? energy_uj
105 ?Ā Ā  ?Ā Ā  ??? max_energy_range_uj
106 ?Ā Ā  ?Ā Ā  ??? name
107 ?Ā Ā  ?Ā Ā  ??? enabled
108 ?Ā Ā  ?Ā Ā  ??? power
109 ?Ā Ā  ?Ā Ā  ?Ā Ā  ??? async
110 ?Ā Ā  ?Ā Ā  ?Ā Ā  []
111 ?Ā Ā  ?Ā Ā  ??? subsystem -> ../../../../../../class/power_cap
112 ?Ā Ā  ?Ā Ā  ??? uevent
113 ?Ā Ā  ??? intel-rapl:1:1
114 ?Ā Ā  ?Ā Ā  ??? constraint_0_name
115 ?Ā Ā  ?Ā Ā  ??? constraint_0_power_limit_uw
116 ?Ā Ā  ?Ā Ā  ??? constraint_0_time_window_us
117 ?Ā Ā  ?Ā Ā  ??? constraint_1_name
118 ?Ā Ā  ?Ā Ā  ??? constraint_1_power_limit_uw
119 ?Ā Ā  ?Ā Ā  ??? constraint_1_time_window_us
120 ?Ā Ā  ?Ā Ā  ??? device -> ../../intel-rapl:1
121 ?Ā Ā  ?Ā Ā  ??? energy_uj
122 ?Ā Ā  ?Ā Ā  ??? max_energy_range_uj
123 ?Ā Ā  ?Ā Ā  ??? name
124 ?Ā Ā  ?Ā Ā  ??? enabled
125 ?Ā Ā  ?Ā Ā  ??? power
126 ?Ā Ā  ?Ā Ā  ?Ā Ā  ??? async
127 ?Ā Ā  ?Ā Ā  ?Ā Ā  []
128 ?Ā Ā  ?Ā Ā  ??? subsystem -> ../../../../../../class/power_cap
129 ?Ā Ā  ?Ā Ā  ??? uevent
130 ?Ā Ā  ??? max_energy_range_uj
131 ?Ā Ā  ??? max_power_range_uw
132 ?Ā Ā  ??? name
133 ?Ā Ā  ??? enabled
134 ?Ā Ā  ??? power
135 ?Ā Ā  ?Ā Ā  ??? async
136 ?Ā Ā  ?Ā Ā  []
137 ?Ā Ā  ??? subsystem -> ../../../../../class/power_cap
138 ?Ā Ā  ??? uevent
139 ??? power
140 ?Ā Ā  ??? async
141 ?Ā Ā  []
142 ??? subsystem -> ../../../../class/power_cap
143 ??? enabled
144 ??? uevent
145
146The above example illustrates a case in which the Intel RAPL technology,
147available in IntelĀ® IA-64 and IA-32 Processor Architectures, is used. There is one
148control type called intel-rapl which contains two power zones, intel-rapl:0 and
149intel-rapl:1, representing CPU packages. Each of these power zones contains
150two subzones, intel-rapl:j:0 and intel-rapl:j:1 (j = 0, 1), representing the
151"core" and the "uncore" parts of the given CPU package, respectively. All of
152the zones and subzones contain energy monitoring attributes (energy_uj,
153max_energy_range_uj) and constraint attributes (constraint_*) allowing controls
154to be applied (the constraints in the 'package' power zones apply to the whole
155CPU packages and the subzone constraints only apply to the respective parts of
156the given package individually). Since Intel RAPL doesn't provide instantaneous
157power value, there is no power_uw attribute.
158
159In addition to that, each power zone contains a name attribute, allowing the
160part of the system represented by that zone to be identified.
161For example:
162
163cat /sys/class/power_cap/intel-rapl/intel-rapl:0/name
164package-0
165
166The Intel RAPL technology allows two constraints, short term and long term,
167with two different time windows to be applied to each power zone. Thus for
168each zone there are 2 attributes representing the constraint names, 2 power
169limits and 2 attributes representing the sizes of the time windows. Such that,
170constraint_j_* attributes correspond to the jth constraint (j = 0,1).
171
172For example:
173 constraint_0_name
174 constraint_0_power_limit_uw
175 constraint_0_time_window_us
176 constraint_1_name
177 constraint_1_power_limit_uw
178 constraint_1_time_window_us
179
180Power Zone Attributes
181=================================
182Monitoring attributes
183----------------------
184
185energy_uj (rw): Current energy counter in micro joules. Write "0" to reset.
186If the counter can not be reset, then this attribute is read only.
187
188max_energy_range_uj (ro): Range of the above energy counter in micro-joules.
189
190power_uw (ro): Current power in micro watts.
191
192max_power_range_uw (ro): Range of the above power value in micro-watts.
193
194name (ro): Name of this power zone.
195
196It is possible that some domains have both power ranges and energy counter ranges;
197however, only one is mandatory.
198
199Constraints
200----------------
201constraint_X_power_limit_uw (rw): Power limit in micro watts, which should be
202applicable for the time window specified by "constraint_X_time_window_us".
203
204constraint_X_time_window_us (rw): Time window in micro seconds.
205
206constraint_X_name (ro): An optional name of the constraint
207
208constraint_X_max_power_uw(ro): Maximum allowed power in micro watts.
209
210constraint_X_min_power_uw(ro): Minimum allowed power in micro watts.
211
212constraint_X_max_time_window_us(ro): Maximum allowed time window in micro seconds.
213
214constraint_X_min_time_window_us(ro): Minimum allowed time window in micro seconds.
215
216Except power_limit_uw and time_window_us other fields are optional.
217
218Common zone and control type attributes
219----------------------------------------
220enabled (rw): Enable/Disable controls at zone level or for all zones using
221a control type.
222
223Power Cap Client Driver Interface
224==================================
225The API summary:
226
227Call powercap_register_control_type() to register control type object.
228Call powercap_register_zone() to register a power zone (under a given
229control type), either as a top-level power zone or as a subzone of another
230power zone registered earlier.
231The number of constraints in a power zone and the corresponding callbacks have
232to be defined prior to calling powercap_register_zone() to register that zone.
233
234To Free a power zone call powercap_unregister_zone().
235To free a control type object call powercap_unregister_control_type().
236Detailed API can be generated using kernel-doc on include/linux/powercap.h.
diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index cb7502852acb..e139b13f2a33 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -218,10 +218,14 @@ void msrs_free(struct msr *msrs);
218#ifdef CONFIG_SMP 218#ifdef CONFIG_SMP
219int rdmsr_on_cpu(unsigned int cpu, u32 msr_no, u32 *l, u32 *h); 219int rdmsr_on_cpu(unsigned int cpu, u32 msr_no, u32 *l, u32 *h);
220int wrmsr_on_cpu(unsigned int cpu, u32 msr_no, u32 l, u32 h); 220int wrmsr_on_cpu(unsigned int cpu, u32 msr_no, u32 l, u32 h);
221int rdmsrl_on_cpu(unsigned int cpu, u32 msr_no, u64 *q);
222int wrmsrl_on_cpu(unsigned int cpu, u32 msr_no, u64 q);
221void rdmsr_on_cpus(const struct cpumask *mask, u32 msr_no, struct msr *msrs); 223void rdmsr_on_cpus(const struct cpumask *mask, u32 msr_no, struct msr *msrs);
222void wrmsr_on_cpus(const struct cpumask *mask, u32 msr_no, struct msr *msrs); 224void wrmsr_on_cpus(const struct cpumask *mask, u32 msr_no, struct msr *msrs);
223int rdmsr_safe_on_cpu(unsigned int cpu, u32 msr_no, u32 *l, u32 *h); 225int rdmsr_safe_on_cpu(unsigned int cpu, u32 msr_no, u32 *l, u32 *h);
224int wrmsr_safe_on_cpu(unsigned int cpu, u32 msr_no, u32 l, u32 h); 226int wrmsr_safe_on_cpu(unsigned int cpu, u32 msr_no, u32 l, u32 h);
227int rdmsrl_safe_on_cpu(unsigned int cpu, u32 msr_no, u64 *q);
228int wrmsrl_safe_on_cpu(unsigned int cpu, u32 msr_no, u64 q);
225int rdmsr_safe_regs_on_cpu(unsigned int cpu, u32 regs[8]); 229int rdmsr_safe_regs_on_cpu(unsigned int cpu, u32 regs[8]);
226int wrmsr_safe_regs_on_cpu(unsigned int cpu, u32 regs[8]); 230int wrmsr_safe_regs_on_cpu(unsigned int cpu, u32 regs[8]);
227#else /* CONFIG_SMP */ 231#else /* CONFIG_SMP */
@@ -235,6 +239,16 @@ static inline int wrmsr_on_cpu(unsigned int cpu, u32 msr_no, u32 l, u32 h)
235 wrmsr(msr_no, l, h); 239 wrmsr(msr_no, l, h);
236 return 0; 240 return 0;
237} 241}
242static inline int rdmsrl_on_cpu(unsigned int cpu, u32 msr_no, u64 *q)
243{
244 rdmsrl(msr_no, *q);
245 return 0;
246}
247static inline int wrmsrl_on_cpu(unsigned int cpu, u32 msr_no, u64 q)
248{
249 wrmsrl(msr_no, q);
250 return 0;
251}
238static inline void rdmsr_on_cpus(const struct cpumask *m, u32 msr_no, 252static inline void rdmsr_on_cpus(const struct cpumask *m, u32 msr_no,
239 struct msr *msrs) 253 struct msr *msrs)
240{ 254{
@@ -254,6 +268,14 @@ static inline int wrmsr_safe_on_cpu(unsigned int cpu, u32 msr_no, u32 l, u32 h)
254{ 268{
255 return wrmsr_safe(msr_no, l, h); 269 return wrmsr_safe(msr_no, l, h);
256} 270}
271static inline int rdmsrl_safe_on_cpu(unsigned int cpu, u32 msr_no, u64 *q)
272{
273 return rdmsrl_safe(msr_no, q);
274}
275static inline int wrmsrl_safe_on_cpu(unsigned int cpu, u32 msr_no, u64 q)
276{
277 return wrmsrl_safe(msr_no, q);
278}
257static inline int rdmsr_safe_regs_on_cpu(unsigned int cpu, u32 regs[8]) 279static inline int rdmsr_safe_regs_on_cpu(unsigned int cpu, u32 regs[8])
258{ 280{
259 return rdmsr_safe_regs(regs); 281 return rdmsr_safe_regs(regs);
diff --git a/arch/x86/lib/msr-smp.c b/arch/x86/lib/msr-smp.c
index a6b1b86d2253..518532e6a3fa 100644
--- a/arch/x86/lib/msr-smp.c
+++ b/arch/x86/lib/msr-smp.c
@@ -47,6 +47,21 @@ int rdmsr_on_cpu(unsigned int cpu, u32 msr_no, u32 *l, u32 *h)
47} 47}
48EXPORT_SYMBOL(rdmsr_on_cpu); 48EXPORT_SYMBOL(rdmsr_on_cpu);
49 49
50int rdmsrl_on_cpu(unsigned int cpu, u32 msr_no, u64 *q)
51{
52 int err;
53 struct msr_info rv;
54
55 memset(&rv, 0, sizeof(rv));
56
57 rv.msr_no = msr_no;
58 err = smp_call_function_single(cpu, __rdmsr_on_cpu, &rv, 1);
59 *q = rv.reg.q;
60
61 return err;
62}
63EXPORT_SYMBOL(rdmsrl_on_cpu);
64
50int wrmsr_on_cpu(unsigned int cpu, u32 msr_no, u32 l, u32 h) 65int wrmsr_on_cpu(unsigned int cpu, u32 msr_no, u32 l, u32 h)
51{ 66{
52 int err; 67 int err;
@@ -63,6 +78,22 @@ int wrmsr_on_cpu(unsigned int cpu, u32 msr_no, u32 l, u32 h)
63} 78}
64EXPORT_SYMBOL(wrmsr_on_cpu); 79EXPORT_SYMBOL(wrmsr_on_cpu);
65 80
81int wrmsrl_on_cpu(unsigned int cpu, u32 msr_no, u64 q)
82{
83 int err;
84 struct msr_info rv;
85
86 memset(&rv, 0, sizeof(rv));
87
88 rv.msr_no = msr_no;
89 rv.reg.q = q;
90
91 err = smp_call_function_single(cpu, __wrmsr_on_cpu, &rv, 1);
92
93 return err;
94}
95EXPORT_SYMBOL(wrmsrl_on_cpu);
96
66static void __rwmsr_on_cpus(const struct cpumask *mask, u32 msr_no, 97static void __rwmsr_on_cpus(const struct cpumask *mask, u32 msr_no,
67 struct msr *msrs, 98 struct msr *msrs,
68 void (*msr_func) (void *info)) 99 void (*msr_func) (void *info))
@@ -159,6 +190,37 @@ int wrmsr_safe_on_cpu(unsigned int cpu, u32 msr_no, u32 l, u32 h)
159} 190}
160EXPORT_SYMBOL(wrmsr_safe_on_cpu); 191EXPORT_SYMBOL(wrmsr_safe_on_cpu);
161 192
193int wrmsrl_safe_on_cpu(unsigned int cpu, u32 msr_no, u64 q)
194{
195 int err;
196 struct msr_info rv;
197
198 memset(&rv, 0, sizeof(rv));
199
200 rv.msr_no = msr_no;
201 rv.reg.q = q;
202
203 err = smp_call_function_single(cpu, __wrmsr_safe_on_cpu, &rv, 1);
204
205 return err ? err : rv.err;
206}
207EXPORT_SYMBOL(wrmsrl_safe_on_cpu);
208
209int rdmsrl_safe_on_cpu(unsigned int cpu, u32 msr_no, u64 *q)
210{
211 int err;
212 struct msr_info rv;
213
214 memset(&rv, 0, sizeof(rv));
215
216 rv.msr_no = msr_no;
217 err = smp_call_function_single(cpu, __rdmsr_safe_on_cpu, &rv, 1);
218 *q = rv.reg.q;
219
220 return err ? err : rv.err;
221}
222EXPORT_SYMBOL(rdmsrl_safe_on_cpu);
223
162/* 224/*
163 * These variants are significantly slower, but allows control over 225 * These variants are significantly slower, but allows control over
164 * the entire 32-bit GPR set. 226 * the entire 32-bit GPR set.
diff --git a/drivers/Kconfig b/drivers/Kconfig
index aa43b911ccef..969e9871785c 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -166,4 +166,6 @@ source "drivers/reset/Kconfig"
166 166
167source "drivers/fmc/Kconfig" 167source "drivers/fmc/Kconfig"
168 168
169source "drivers/powercap/Kconfig"
170
169endmenu 171endmenu
diff --git a/drivers/Makefile b/drivers/Makefile
index ab93de8297f1..34c1d554f69b 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -152,3 +152,4 @@ obj-$(CONFIG_VME_BUS) += vme/
152obj-$(CONFIG_IPACK_BUS) += ipack/ 152obj-$(CONFIG_IPACK_BUS) += ipack/
153obj-$(CONFIG_NTB) += ntb/ 153obj-$(CONFIG_NTB) += ntb/
154obj-$(CONFIG_FMC) += fmc/ 154obj-$(CONFIG_FMC) += fmc/
155obj-$(CONFIG_POWERCAP) += powercap/
diff --git a/drivers/powercap/Kconfig b/drivers/powercap/Kconfig
new file mode 100644
index 000000000000..a7c81b53d88a
--- /dev/null
+++ b/drivers/powercap/Kconfig
@@ -0,0 +1,32 @@
1#
2# Generic power capping sysfs interface configuration
3#
4
5menuconfig POWERCAP
6 bool "Generic powercap sysfs driver"
7 help
8 The power capping sysfs interface allows kernel subsystems to expose power
9 capping settings to user space in a consistent way. Usually, it consists
10 of multiple control types that determine which settings may be exposed and
11 power zones representing parts of the system that can be subject to power
12 capping.
13
14 If you want this code to be compiled in, say Y here.
15
16if POWERCAP
17# Client driver configurations go here.
18config INTEL_RAPL
19 tristate "Intel RAPL Support"
20 depends on X86
21 default n
22 ---help---
23 This enables support for the Intel Running Average Power Limit (RAPL)
24 technology which allows power limits to be enforced and monitored on
25 modern Intel processors (Sandy Bridge and later).
26
27 In RAPL, the platform level settings are divided into domains for
28 fine grained control. These domains include processor package, DRAM
29 controller, CPU core (Power Plance 0), graphics uncore (Power Plane
30 1), etc.
31
32endif
diff --git a/drivers/powercap/Makefile b/drivers/powercap/Makefile
new file mode 100644
index 000000000000..0a21ef31372b
--- /dev/null
+++ b/drivers/powercap/Makefile
@@ -0,0 +1,2 @@
1obj-$(CONFIG_POWERCAP) += powercap_sys.o
2obj-$(CONFIG_INTEL_RAPL) += intel_rapl.o
diff --git a/drivers/powercap/intel_rapl.c b/drivers/powercap/intel_rapl.c
new file mode 100644
index 000000000000..2a786c504460
--- /dev/null
+++ b/drivers/powercap/intel_rapl.c
@@ -0,0 +1,1395 @@
1/*
2 * Intel Running Average Power Limit (RAPL) Driver
3 * Copyright (c) 2013, Intel Corporation.
4 *
5 * This program is free software; you can redistribute it and/or modify it
6 * under the terms and conditions of the GNU General Public License,
7 * version 2, as published by the Free Software Foundation.
8 *
9 * This program is distributed in the hope it will be useful, but WITHOUT
10 * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
11 * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
12 * more details.
13 *
14 * You should have received a copy of the GNU General Public License along with
15 * this program; if not, write to the Free Software Foundation, Inc.
16 *
17 */
18#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
19
20#include <linux/kernel.h>
21#include <linux/module.h>
22#include <linux/list.h>
23#include <linux/types.h>
24#include <linux/device.h>
25#include <linux/slab.h>
26#include <linux/log2.h>
27#include <linux/bitmap.h>
28#include <linux/delay.h>
29#include <linux/sysfs.h>
30#include <linux/cpu.h>
31#include <linux/powercap.h>
32
33#include <asm/processor.h>
34#include <asm/cpu_device_id.h>
35
36/* bitmasks for RAPL MSRs, used by primitive access functions */
37#define ENERGY_STATUS_MASK 0xffffffff
38
39#define POWER_LIMIT1_MASK 0x7FFF
40#define POWER_LIMIT1_ENABLE BIT(15)
41#define POWER_LIMIT1_CLAMP BIT(16)
42
43#define POWER_LIMIT2_MASK (0x7FFFULL<<32)
44#define POWER_LIMIT2_ENABLE BIT_ULL(47)
45#define POWER_LIMIT2_CLAMP BIT_ULL(48)
46#define POWER_PACKAGE_LOCK BIT_ULL(63)
47#define POWER_PP_LOCK BIT(31)
48
49#define TIME_WINDOW1_MASK (0x7FULL<<17)
50#define TIME_WINDOW2_MASK (0x7FULL<<49)
51
52#define POWER_UNIT_OFFSET 0
53#define POWER_UNIT_MASK 0x0F
54
55#define ENERGY_UNIT_OFFSET 0x08
56#define ENERGY_UNIT_MASK 0x1F00
57
58#define TIME_UNIT_OFFSET 0x10
59#define TIME_UNIT_MASK 0xF0000
60
61#define POWER_INFO_MAX_MASK (0x7fffULL<<32)
62#define POWER_INFO_MIN_MASK (0x7fffULL<<16)
63#define POWER_INFO_MAX_TIME_WIN_MASK (0x3fULL<<48)
64#define POWER_INFO_THERMAL_SPEC_MASK 0x7fff
65
66#define PERF_STATUS_THROTTLE_TIME_MASK 0xffffffff
67#define PP_POLICY_MASK 0x1F
68
69/* Non HW constants */
70#define RAPL_PRIMITIVE_DERIVED BIT(1) /* not from raw data */
71#define RAPL_PRIMITIVE_DUMMY BIT(2)
72
73/* scale RAPL units to avoid floating point math inside kernel */
74#define POWER_UNIT_SCALE (1000000)
75#define ENERGY_UNIT_SCALE (1000000)
76#define TIME_UNIT_SCALE (1000000)
77
78#define TIME_WINDOW_MAX_MSEC 40000
79#define TIME_WINDOW_MIN_MSEC 250
80
81enum unit_type {
82 ARBITRARY_UNIT, /* no translation */
83 POWER_UNIT,
84 ENERGY_UNIT,
85 TIME_UNIT,
86};
87
88enum rapl_domain_type {
89 RAPL_DOMAIN_PACKAGE, /* entire package/socket */
90 RAPL_DOMAIN_PP0, /* core power plane */
91 RAPL_DOMAIN_PP1, /* graphics uncore */
92 RAPL_DOMAIN_DRAM,/* DRAM control_type */
93 RAPL_DOMAIN_MAX,
94};
95
96enum rapl_domain_msr_id {
97 RAPL_DOMAIN_MSR_LIMIT,
98 RAPL_DOMAIN_MSR_STATUS,
99 RAPL_DOMAIN_MSR_PERF,
100 RAPL_DOMAIN_MSR_POLICY,
101 RAPL_DOMAIN_MSR_INFO,
102 RAPL_DOMAIN_MSR_MAX,
103};
104
105/* per domain data, some are optional */
106enum rapl_primitives {
107 ENERGY_COUNTER,
108 POWER_LIMIT1,
109 POWER_LIMIT2,
110 FW_LOCK,
111
112 PL1_ENABLE, /* power limit 1, aka long term */
113 PL1_CLAMP, /* allow frequency to go below OS request */
114 PL2_ENABLE, /* power limit 2, aka short term, instantaneous */
115 PL2_CLAMP,
116
117 TIME_WINDOW1, /* long term */
118 TIME_WINDOW2, /* short term */
119 THERMAL_SPEC_POWER,
120 MAX_POWER,
121
122 MIN_POWER,
123 MAX_TIME_WINDOW,
124 THROTTLED_TIME,
125 PRIORITY_LEVEL,
126
127 /* below are not raw primitive data */
128 AVERAGE_POWER,
129 NR_RAPL_PRIMITIVES,
130};
131
132#define NR_RAW_PRIMITIVES (NR_RAPL_PRIMITIVES - 2)
133
134/* Can be expanded to include events, etc.*/
135struct rapl_domain_data {
136 u64 primitives[NR_RAPL_PRIMITIVES];
137 unsigned long timestamp;
138};
139
140
141#define DOMAIN_STATE_INACTIVE BIT(0)
142#define DOMAIN_STATE_POWER_LIMIT_SET BIT(1)
143#define DOMAIN_STATE_BIOS_LOCKED BIT(2)
144
145#define NR_POWER_LIMITS (2)
146struct rapl_power_limit {
147 struct powercap_zone_constraint *constraint;
148 int prim_id; /* primitive ID used to enable */
149 struct rapl_domain *domain;
150 const char *name;
151};
152
153static const char pl1_name[] = "long_term";
154static const char pl2_name[] = "short_term";
155
156struct rapl_domain {
157 const char *name;
158 enum rapl_domain_type id;
159 int msrs[RAPL_DOMAIN_MSR_MAX];
160 struct powercap_zone power_zone;
161 struct rapl_domain_data rdd;
162 struct rapl_power_limit rpl[NR_POWER_LIMITS];
163 u64 attr_map; /* track capabilities */
164 unsigned int state;
165 int package_id;
166};
167#define power_zone_to_rapl_domain(_zone) \
168 container_of(_zone, struct rapl_domain, power_zone)
169
170
171/* Each physical package contains multiple domains, these are the common
172 * data across RAPL domains within a package.
173 */
174struct rapl_package {
175 unsigned int id; /* physical package/socket id */
176 unsigned int nr_domains;
177 unsigned long domain_map; /* bit map of active domains */
178 unsigned int power_unit_divisor;
179 unsigned int energy_unit_divisor;
180 unsigned int time_unit_divisor;
181 struct rapl_domain *domains; /* array of domains, sized at runtime */
182 struct powercap_zone *power_zone; /* keep track of parent zone */
183 int nr_cpus; /* active cpus on the package, topology info is lost during
184 * cpu hotplug. so we have to track ourselves.
185 */
186 unsigned long power_limit_irq; /* keep track of package power limit
187 * notify interrupt enable status.
188 */
189 struct list_head plist;
190};
191#define PACKAGE_PLN_INT_SAVED BIT(0)
192#define MAX_PRIM_NAME (32)
193
194/* per domain data. used to describe individual knobs such that access function
195 * can be consolidated into one instead of many inline functions.
196 */
197struct rapl_primitive_info {
198 const char *name;
199 u64 mask;
200 int shift;
201 enum rapl_domain_msr_id id;
202 enum unit_type unit;
203 u32 flag;
204};
205
206#define PRIMITIVE_INFO_INIT(p, m, s, i, u, f) { \
207 .name = #p, \
208 .mask = m, \
209 .shift = s, \
210 .id = i, \
211 .unit = u, \
212 .flag = f \
213 }
214
215static void rapl_init_domains(struct rapl_package *rp);
216static int rapl_read_data_raw(struct rapl_domain *rd,
217 enum rapl_primitives prim,
218 bool xlate, u64 *data);
219static int rapl_write_data_raw(struct rapl_domain *rd,
220 enum rapl_primitives prim,
221 unsigned long long value);
222static u64 rapl_unit_xlate(int package, enum unit_type type, u64 value,
223 int to_raw);
224static void package_power_limit_irq_save(int package_id);
225
226static LIST_HEAD(rapl_packages); /* guarded by CPU hotplug lock */
227
228static const char * const rapl_domain_names[] = {
229 "package",
230 "core",
231 "uncore",
232 "dram",
233};
234
235static struct powercap_control_type *control_type; /* PowerCap Controller */
236
237/* caller to ensure CPU hotplug lock is held */
238static struct rapl_package *find_package_by_id(int id)
239{
240 struct rapl_package *rp;
241
242 list_for_each_entry(rp, &rapl_packages, plist) {
243 if (rp->id == id)
244 return rp;
245 }
246
247 return NULL;
248}
249
250/* caller to ensure CPU hotplug lock is held */
251static int find_active_cpu_on_package(int package_id)
252{
253 int i;
254
255 for_each_online_cpu(i) {
256 if (topology_physical_package_id(i) == package_id)
257 return i;
258 }
259 /* all CPUs on this package are offline */
260
261 return -ENODEV;
262}
263
264/* caller must hold cpu hotplug lock */
265static void rapl_cleanup_data(void)
266{
267 struct rapl_package *p, *tmp;
268
269 list_for_each_entry_safe(p, tmp, &rapl_packages, plist) {
270 kfree(p->domains);
271 list_del(&p->plist);
272 kfree(p);
273 }
274}
275
276static int get_energy_counter(struct powercap_zone *power_zone, u64 *energy_raw)
277{
278 struct rapl_domain *rd;
279 u64 energy_now;
280
281 /* prevent CPU hotplug, make sure the RAPL domain does not go
282 * away while reading the counter.
283 */
284 get_online_cpus();
285 rd = power_zone_to_rapl_domain(power_zone);
286
287 if (!rapl_read_data_raw(rd, ENERGY_COUNTER, true, &energy_now)) {
288 *energy_raw = energy_now;
289 put_online_cpus();
290
291 return 0;
292 }
293 put_online_cpus();
294
295 return -EIO;
296}
297
298static int get_max_energy_counter(struct powercap_zone *pcd_dev, u64 *energy)
299{
300 *energy = rapl_unit_xlate(0, ENERGY_UNIT, ENERGY_STATUS_MASK, 0);
301 return 0;
302}
303
304static int release_zone(struct powercap_zone *power_zone)
305{
306 struct rapl_domain *rd = power_zone_to_rapl_domain(power_zone);
307 struct rapl_package *rp;
308
309 /* package zone is the last zone of a package, we can free
310 * memory here since all children has been unregistered.
311 */
312 if (rd->id == RAPL_DOMAIN_PACKAGE) {
313 rp = find_package_by_id(rd->package_id);
314 if (!rp) {
315 dev_warn(&power_zone->dev, "no package id %s\n",
316 rd->name);
317 return -ENODEV;
318 }
319 kfree(rd);
320 rp->domains = NULL;
321 }
322
323 return 0;
324
325}
326
327static int find_nr_power_limit(struct rapl_domain *rd)
328{
329 int i;
330
331 for (i = 0; i < NR_POWER_LIMITS; i++) {
332 if (rd->rpl[i].name == NULL)
333 break;
334 }
335
336 return i;
337}
338
339static int set_domain_enable(struct powercap_zone *power_zone, bool mode)
340{
341 struct rapl_domain *rd = power_zone_to_rapl_domain(power_zone);
342 int nr_powerlimit;
343
344 if (rd->state & DOMAIN_STATE_BIOS_LOCKED)
345 return -EACCES;
346 get_online_cpus();
347 nr_powerlimit = find_nr_power_limit(rd);
348 /* here we activate/deactivate the hardware for power limiting */
349 rapl_write_data_raw(rd, PL1_ENABLE, mode);
350 /* always enable clamp such that p-state can go below OS requested
351 * range. power capping priority over guranteed frequency.
352 */
353 rapl_write_data_raw(rd, PL1_CLAMP, mode);
354 /* some domains have pl2 */
355 if (nr_powerlimit > 1) {
356 rapl_write_data_raw(rd, PL2_ENABLE, mode);
357 rapl_write_data_raw(rd, PL2_CLAMP, mode);
358 }
359 put_online_cpus();
360
361 return 0;
362}
363
364static int get_domain_enable(struct powercap_zone *power_zone, bool *mode)
365{
366 struct rapl_domain *rd = power_zone_to_rapl_domain(power_zone);
367 u64 val;
368
369 if (rd->state & DOMAIN_STATE_BIOS_LOCKED) {
370 *mode = false;
371 return 0;
372 }
373 get_online_cpus();
374 if (rapl_read_data_raw(rd, PL1_ENABLE, true, &val)) {
375 put_online_cpus();
376 return -EIO;
377 }
378 *mode = val;
379 put_online_cpus();
380
381 return 0;
382}
383
384/* per RAPL domain ops, in the order of rapl_domain_type */
385static struct powercap_zone_ops zone_ops[] = {
386 /* RAPL_DOMAIN_PACKAGE */
387 {
388 .get_energy_uj = get_energy_counter,
389 .get_max_energy_range_uj = get_max_energy_counter,
390 .release = release_zone,
391 .set_enable = set_domain_enable,
392 .get_enable = get_domain_enable,
393 },
394 /* RAPL_DOMAIN_PP0 */
395 {
396 .get_energy_uj = get_energy_counter,
397 .get_max_energy_range_uj = get_max_energy_counter,
398 .release = release_zone,
399 .set_enable = set_domain_enable,
400 .get_enable = get_domain_enable,
401 },
402 /* RAPL_DOMAIN_PP1 */
403 {
404 .get_energy_uj = get_energy_counter,
405 .get_max_energy_range_uj = get_max_energy_counter,
406 .release = release_zone,
407 .set_enable = set_domain_enable,
408 .get_enable = get_domain_enable,
409 },
410 /* RAPL_DOMAIN_DRAM */
411 {
412 .get_energy_uj = get_energy_counter,
413 .get_max_energy_range_uj = get_max_energy_counter,
414 .release = release_zone,
415 .set_enable = set_domain_enable,
416 .get_enable = get_domain_enable,
417 },
418};
419
420static int set_power_limit(struct powercap_zone *power_zone, int id,
421 u64 power_limit)
422{
423 struct rapl_domain *rd;
424 struct rapl_package *rp;
425 int ret = 0;
426
427 get_online_cpus();
428 rd = power_zone_to_rapl_domain(power_zone);
429 rp = find_package_by_id(rd->package_id);
430 if (!rp) {
431 ret = -ENODEV;
432 goto set_exit;
433 }
434
435 if (rd->state & DOMAIN_STATE_BIOS_LOCKED) {
436 dev_warn(&power_zone->dev, "%s locked by BIOS, monitoring only\n",
437 rd->name);
438 ret = -EACCES;
439 goto set_exit;
440 }
441
442 switch (rd->rpl[id].prim_id) {
443 case PL1_ENABLE:
444 rapl_write_data_raw(rd, POWER_LIMIT1, power_limit);
445 break;
446 case PL2_ENABLE:
447 rapl_write_data_raw(rd, POWER_LIMIT2, power_limit);
448 break;
449 default:
450 ret = -EINVAL;
451 }
452 if (!ret)
453 package_power_limit_irq_save(rd->package_id);
454set_exit:
455 put_online_cpus();
456 return ret;
457}
458
459static int get_current_power_limit(struct powercap_zone *power_zone, int id,
460 u64 *data)
461{
462 struct rapl_domain *rd;
463 u64 val;
464 int prim;
465 int ret = 0;
466
467 get_online_cpus();
468 rd = power_zone_to_rapl_domain(power_zone);
469 switch (rd->rpl[id].prim_id) {
470 case PL1_ENABLE:
471 prim = POWER_LIMIT1;
472 break;
473 case PL2_ENABLE:
474 prim = POWER_LIMIT2;
475 break;
476 default:
477 put_online_cpus();
478 return -EINVAL;
479 }
480 if (rapl_read_data_raw(rd, prim, true, &val))
481 ret = -EIO;
482 else
483 *data = val;
484
485 put_online_cpus();
486
487 return ret;
488}
489
490static int set_time_window(struct powercap_zone *power_zone, int id,
491 u64 window)
492{
493 struct rapl_domain *rd;
494 int ret = 0;
495
496 get_online_cpus();
497 rd = power_zone_to_rapl_domain(power_zone);
498 switch (rd->rpl[id].prim_id) {
499 case PL1_ENABLE:
500 rapl_write_data_raw(rd, TIME_WINDOW1, window);
501 break;
502 case PL2_ENABLE:
503 rapl_write_data_raw(rd, TIME_WINDOW2, window);
504 break;
505 default:
506 ret = -EINVAL;
507 }
508 put_online_cpus();
509 return ret;
510}
511
512static int get_time_window(struct powercap_zone *power_zone, int id, u64 *data)
513{
514 struct rapl_domain *rd;
515 u64 val;
516 int ret = 0;
517
518 get_online_cpus();
519 rd = power_zone_to_rapl_domain(power_zone);
520 switch (rd->rpl[id].prim_id) {
521 case PL1_ENABLE:
522 ret = rapl_read_data_raw(rd, TIME_WINDOW1, true, &val);
523 break;
524 case PL2_ENABLE:
525 ret = rapl_read_data_raw(rd, TIME_WINDOW2, true, &val);
526 break;
527 default:
528 put_online_cpus();
529 return -EINVAL;
530 }
531 if (!ret)
532 *data = val;
533 put_online_cpus();
534
535 return ret;
536}
537
538static const char *get_constraint_name(struct powercap_zone *power_zone, int id)
539{
540 struct rapl_power_limit *rpl;
541 struct rapl_domain *rd;
542
543 rd = power_zone_to_rapl_domain(power_zone);
544 rpl = (struct rapl_power_limit *) &rd->rpl[id];
545
546 return rpl->name;
547}
548
549
550static int get_max_power(struct powercap_zone *power_zone, int id,
551 u64 *data)
552{
553 struct rapl_domain *rd;
554 u64 val;
555 int prim;
556 int ret = 0;
557
558 get_online_cpus();
559 rd = power_zone_to_rapl_domain(power_zone);
560 switch (rd->rpl[id].prim_id) {
561 case PL1_ENABLE:
562 prim = THERMAL_SPEC_POWER;
563 break;
564 case PL2_ENABLE:
565 prim = MAX_POWER;
566 break;
567 default:
568 put_online_cpus();
569 return -EINVAL;
570 }
571 if (rapl_read_data_raw(rd, prim, true, &val))
572 ret = -EIO;
573 else
574 *data = val;
575
576 put_online_cpus();
577
578 return ret;
579}
580
581static struct powercap_zone_constraint_ops constraint_ops = {
582 .set_power_limit_uw = set_power_limit,
583 .get_power_limit_uw = get_current_power_limit,
584 .set_time_window_us = set_time_window,
585 .get_time_window_us = get_time_window,
586 .get_max_power_uw = get_max_power,
587 .get_name = get_constraint_name,
588};
589
590/* called after domain detection and package level data are set */
591static void rapl_init_domains(struct rapl_package *rp)
592{
593 int i;
594 struct rapl_domain *rd = rp->domains;
595
596 for (i = 0; i < RAPL_DOMAIN_MAX; i++) {
597 unsigned int mask = rp->domain_map & (1 << i);
598 switch (mask) {
599 case BIT(RAPL_DOMAIN_PACKAGE):
600 rd->name = rapl_domain_names[RAPL_DOMAIN_PACKAGE];
601 rd->id = RAPL_DOMAIN_PACKAGE;
602 rd->msrs[0] = MSR_PKG_POWER_LIMIT;
603 rd->msrs[1] = MSR_PKG_ENERGY_STATUS;
604 rd->msrs[2] = MSR_PKG_PERF_STATUS;
605 rd->msrs[3] = 0;
606 rd->msrs[4] = MSR_PKG_POWER_INFO;
607 rd->rpl[0].prim_id = PL1_ENABLE;
608 rd->rpl[0].name = pl1_name;
609 rd->rpl[1].prim_id = PL2_ENABLE;
610 rd->rpl[1].name = pl2_name;
611 break;
612 case BIT(RAPL_DOMAIN_PP0):
613 rd->name = rapl_domain_names[RAPL_DOMAIN_PP0];
614 rd->id = RAPL_DOMAIN_PP0;
615 rd->msrs[0] = MSR_PP0_POWER_LIMIT;
616 rd->msrs[1] = MSR_PP0_ENERGY_STATUS;
617 rd->msrs[2] = 0;
618 rd->msrs[3] = MSR_PP0_POLICY;
619 rd->msrs[4] = 0;
620 rd->rpl[0].prim_id = PL1_ENABLE;
621 rd->rpl[0].name = pl1_name;
622 break;
623 case BIT(RAPL_DOMAIN_PP1):
624 rd->name = rapl_domain_names[RAPL_DOMAIN_PP1];
625 rd->id = RAPL_DOMAIN_PP1;
626 rd->msrs[0] = MSR_PP1_POWER_LIMIT;
627 rd->msrs[1] = MSR_PP1_ENERGY_STATUS;
628 rd->msrs[2] = 0;
629 rd->msrs[3] = MSR_PP1_POLICY;
630 rd->msrs[4] = 0;
631 rd->rpl[0].prim_id = PL1_ENABLE;
632 rd->rpl[0].name = pl1_name;
633 break;
634 case BIT(RAPL_DOMAIN_DRAM):
635 rd->name = rapl_domain_names[RAPL_DOMAIN_DRAM];
636 rd->id = RAPL_DOMAIN_DRAM;
637 rd->msrs[0] = MSR_DRAM_POWER_LIMIT;
638 rd->msrs[1] = MSR_DRAM_ENERGY_STATUS;
639 rd->msrs[2] = MSR_DRAM_PERF_STATUS;
640 rd->msrs[3] = 0;
641 rd->msrs[4] = MSR_DRAM_POWER_INFO;
642 rd->rpl[0].prim_id = PL1_ENABLE;
643 rd->rpl[0].name = pl1_name;
644 break;
645 }
646 if (mask) {
647 rd->package_id = rp->id;
648 rd++;
649 }
650 }
651}
652
653static u64 rapl_unit_xlate(int package, enum unit_type type, u64 value,
654 int to_raw)
655{
656 u64 divisor = 1;
657 int scale = 1; /* scale to user friendly data without floating point */
658 u64 f, y; /* fraction and exp. used for time unit */
659 struct rapl_package *rp;
660
661 rp = find_package_by_id(package);
662 if (!rp)
663 return value;
664
665 switch (type) {
666 case POWER_UNIT:
667 divisor = rp->power_unit_divisor;
668 scale = POWER_UNIT_SCALE;
669 break;
670 case ENERGY_UNIT:
671 scale = ENERGY_UNIT_SCALE;
672 divisor = rp->energy_unit_divisor;
673 break;
674 case TIME_UNIT:
675 divisor = rp->time_unit_divisor;
676 scale = TIME_UNIT_SCALE;
677 /* special processing based on 2^Y*(1+F)/4 = val/divisor, refer
678 * to Intel Software Developer's manual Vol. 3a, CH 14.7.4.
679 */
680 if (!to_raw) {
681 f = (value & 0x60) >> 5;
682 y = value & 0x1f;
683 value = (1 << y) * (4 + f) * scale / 4;
684 return div64_u64(value, divisor);
685 } else {
686 do_div(value, scale);
687 value *= divisor;
688 y = ilog2(value);
689 f = div64_u64(4 * (value - (1 << y)), 1 << y);
690 value = (y & 0x1f) | ((f & 0x3) << 5);
691 return value;
692 }
693 break;
694 case ARBITRARY_UNIT:
695 default:
696 return value;
697 };
698
699 if (to_raw)
700 return div64_u64(value * divisor, scale);
701 else
702 return div64_u64(value * scale, divisor);
703}
704
705/* in the order of enum rapl_primitives */
706static struct rapl_primitive_info rpi[] = {
707 /* name, mask, shift, msr index, unit divisor */
708 PRIMITIVE_INFO_INIT(ENERGY_COUNTER, ENERGY_STATUS_MASK, 0,
709 RAPL_DOMAIN_MSR_STATUS, ENERGY_UNIT, 0),
710 PRIMITIVE_INFO_INIT(POWER_LIMIT1, POWER_LIMIT1_MASK, 0,
711 RAPL_DOMAIN_MSR_LIMIT, POWER_UNIT, 0),
712 PRIMITIVE_INFO_INIT(POWER_LIMIT2, POWER_LIMIT2_MASK, 32,
713 RAPL_DOMAIN_MSR_LIMIT, POWER_UNIT, 0),
714 PRIMITIVE_INFO_INIT(FW_LOCK, POWER_PP_LOCK, 31,
715 RAPL_DOMAIN_MSR_LIMIT, ARBITRARY_UNIT, 0),
716 PRIMITIVE_INFO_INIT(PL1_ENABLE, POWER_LIMIT1_ENABLE, 15,
717 RAPL_DOMAIN_MSR_LIMIT, ARBITRARY_UNIT, 0),
718 PRIMITIVE_INFO_INIT(PL1_CLAMP, POWER_LIMIT1_CLAMP, 16,
719 RAPL_DOMAIN_MSR_LIMIT, ARBITRARY_UNIT, 0),
720 PRIMITIVE_INFO_INIT(PL2_ENABLE, POWER_LIMIT2_ENABLE, 47,
721 RAPL_DOMAIN_MSR_LIMIT, ARBITRARY_UNIT, 0),
722 PRIMITIVE_INFO_INIT(PL2_CLAMP, POWER_LIMIT2_CLAMP, 48,
723 RAPL_DOMAIN_MSR_LIMIT, ARBITRARY_UNIT, 0),
724 PRIMITIVE_INFO_INIT(TIME_WINDOW1, TIME_WINDOW1_MASK, 17,
725 RAPL_DOMAIN_MSR_LIMIT, TIME_UNIT, 0),
726 PRIMITIVE_INFO_INIT(TIME_WINDOW2, TIME_WINDOW2_MASK, 49,
727 RAPL_DOMAIN_MSR_LIMIT, TIME_UNIT, 0),
728 PRIMITIVE_INFO_INIT(THERMAL_SPEC_POWER, POWER_INFO_THERMAL_SPEC_MASK,
729 0, RAPL_DOMAIN_MSR_INFO, POWER_UNIT, 0),
730 PRIMITIVE_INFO_INIT(MAX_POWER, POWER_INFO_MAX_MASK, 32,
731 RAPL_DOMAIN_MSR_INFO, POWER_UNIT, 0),
732 PRIMITIVE_INFO_INIT(MIN_POWER, POWER_INFO_MIN_MASK, 16,
733 RAPL_DOMAIN_MSR_INFO, POWER_UNIT, 0),
734 PRIMITIVE_INFO_INIT(MAX_TIME_WINDOW, POWER_INFO_MAX_TIME_WIN_MASK, 48,
735 RAPL_DOMAIN_MSR_INFO, TIME_UNIT, 0),
736 PRIMITIVE_INFO_INIT(THROTTLED_TIME, PERF_STATUS_THROTTLE_TIME_MASK, 0,
737 RAPL_DOMAIN_MSR_PERF, TIME_UNIT, 0),
738 PRIMITIVE_INFO_INIT(PRIORITY_LEVEL, PP_POLICY_MASK, 0,
739 RAPL_DOMAIN_MSR_POLICY, ARBITRARY_UNIT, 0),
740 /* non-hardware */
741 PRIMITIVE_INFO_INIT(AVERAGE_POWER, 0, 0, 0, POWER_UNIT,
742 RAPL_PRIMITIVE_DERIVED),
743 {NULL, 0, 0, 0},
744};
745
746/* Read primitive data based on its related struct rapl_primitive_info.
747 * if xlate flag is set, return translated data based on data units, i.e.
748 * time, energy, and power.
749 * RAPL MSRs are non-architectual and are laid out not consistently across
750 * domains. Here we use primitive info to allow writing consolidated access
751 * functions.
752 * For a given primitive, it is processed by MSR mask and shift. Unit conversion
753 * is pre-assigned based on RAPL unit MSRs read at init time.
754 * 63-------------------------- 31--------------------------- 0
755 * | xxxxx (mask) |
756 * | |<- shift ----------------|
757 * 63-------------------------- 31--------------------------- 0
758 */
759static int rapl_read_data_raw(struct rapl_domain *rd,
760 enum rapl_primitives prim,
761 bool xlate, u64 *data)
762{
763 u64 value, final;
764 u32 msr;
765 struct rapl_primitive_info *rp = &rpi[prim];
766 int cpu;
767
768 if (!rp->name || rp->flag & RAPL_PRIMITIVE_DUMMY)
769 return -EINVAL;
770
771 msr = rd->msrs[rp->id];
772 if (!msr)
773 return -EINVAL;
774 /* use physical package id to look up active cpus */
775 cpu = find_active_cpu_on_package(rd->package_id);
776 if (cpu < 0)
777 return cpu;
778
779 /* special-case package domain, which uses a different bit*/
780 if (prim == FW_LOCK && rd->id == RAPL_DOMAIN_PACKAGE) {
781 rp->mask = POWER_PACKAGE_LOCK;
782 rp->shift = 63;
783 }
784 /* non-hardware data are collected by the polling thread */
785 if (rp->flag & RAPL_PRIMITIVE_DERIVED) {
786 *data = rd->rdd.primitives[prim];
787 return 0;
788 }
789
790 if (rdmsrl_safe_on_cpu(cpu, msr, &value)) {
791 pr_debug("failed to read msr 0x%x on cpu %d\n", msr, cpu);
792 return -EIO;
793 }
794
795 final = value & rp->mask;
796 final = final >> rp->shift;
797 if (xlate)
798 *data = rapl_unit_xlate(rd->package_id, rp->unit, final, 0);
799 else
800 *data = final;
801
802 return 0;
803}
804
805/* Similar use of primitive info in the read counterpart */
806static int rapl_write_data_raw(struct rapl_domain *rd,
807 enum rapl_primitives prim,
808 unsigned long long value)
809{
810 u64 msr_val;
811 u32 msr;
812 struct rapl_primitive_info *rp = &rpi[prim];
813 int cpu;
814
815 cpu = find_active_cpu_on_package(rd->package_id);
816 if (cpu < 0)
817 return cpu;
818 msr = rd->msrs[rp->id];
819 if (rdmsrl_safe_on_cpu(cpu, msr, &msr_val)) {
820 dev_dbg(&rd->power_zone.dev,
821 "failed to read msr 0x%x on cpu %d\n", msr, cpu);
822 return -EIO;
823 }
824 value = rapl_unit_xlate(rd->package_id, rp->unit, value, 1);
825 msr_val &= ~rp->mask;
826 msr_val |= value << rp->shift;
827 if (wrmsrl_safe_on_cpu(cpu, msr, msr_val)) {
828 dev_dbg(&rd->power_zone.dev,
829 "failed to write msr 0x%x on cpu %d\n", msr, cpu);
830 return -EIO;
831 }
832
833 return 0;
834}
835
836static int rapl_check_unit(struct rapl_package *rp, int cpu)
837{
838 u64 msr_val;
839 u32 value;
840
841 if (rdmsrl_safe_on_cpu(cpu, MSR_RAPL_POWER_UNIT, &msr_val)) {
842 pr_err("Failed to read power unit MSR 0x%x on CPU %d, exit.\n",
843 MSR_RAPL_POWER_UNIT, cpu);
844 return -ENODEV;
845 }
846
847 /* Raw RAPL data stored in MSRs are in certain scales. We need to
848 * convert them into standard units based on the divisors reported in
849 * the RAPL unit MSRs.
850 * i.e.
851 * energy unit: 1/enery_unit_divisor Joules
852 * power unit: 1/power_unit_divisor Watts
853 * time unit: 1/time_unit_divisor Seconds
854 */
855 value = (msr_val & ENERGY_UNIT_MASK) >> ENERGY_UNIT_OFFSET;
856 rp->energy_unit_divisor = 1 << value;
857
858
859 value = (msr_val & POWER_UNIT_MASK) >> POWER_UNIT_OFFSET;
860 rp->power_unit_divisor = 1 << value;
861
862 value = (msr_val & TIME_UNIT_MASK) >> TIME_UNIT_OFFSET;
863 rp->time_unit_divisor = 1 << value;
864
865 pr_debug("Physical package %d units: energy=%d, time=%d, power=%d\n",
866 rp->id,
867 rp->energy_unit_divisor,
868 rp->time_unit_divisor,
869 rp->power_unit_divisor);
870
871 return 0;
872}
873
874/* REVISIT:
875 * When package power limit is set artificially low by RAPL, LVT
876 * thermal interrupt for package power limit should be ignored
877 * since we are not really exceeding the real limit. The intention
878 * is to avoid excessive interrupts while we are trying to save power.
879 * A useful feature might be routing the package_power_limit interrupt
880 * to userspace via eventfd. once we have a usecase, this is simple
881 * to do by adding an atomic notifier.
882 */
883
884static void package_power_limit_irq_save(int package_id)
885{
886 u32 l, h = 0;
887 int cpu;
888 struct rapl_package *rp;
889
890 rp = find_package_by_id(package_id);
891 if (!rp)
892 return;
893
894 if (!boot_cpu_has(X86_FEATURE_PTS) || !boot_cpu_has(X86_FEATURE_PLN))
895 return;
896
897 cpu = find_active_cpu_on_package(package_id);
898 if (cpu < 0)
899 return;
900 /* save the state of PLN irq mask bit before disabling it */
901 rdmsr_safe_on_cpu(cpu, MSR_IA32_PACKAGE_THERM_INTERRUPT, &l, &h);
902 if (!(rp->power_limit_irq & PACKAGE_PLN_INT_SAVED)) {
903 rp->power_limit_irq = l & PACKAGE_THERM_INT_PLN_ENABLE;
904 rp->power_limit_irq |= PACKAGE_PLN_INT_SAVED;
905 }
906 l &= ~PACKAGE_THERM_INT_PLN_ENABLE;
907 wrmsr_on_cpu(cpu, MSR_IA32_PACKAGE_THERM_INTERRUPT, l, h);
908}
909
910/* restore per package power limit interrupt enable state */
911static void package_power_limit_irq_restore(int package_id)
912{
913 u32 l, h;
914 int cpu;
915 struct rapl_package *rp;
916
917 rp = find_package_by_id(package_id);
918 if (!rp)
919 return;
920
921 if (!boot_cpu_has(X86_FEATURE_PTS) || !boot_cpu_has(X86_FEATURE_PLN))
922 return;
923
924 cpu = find_active_cpu_on_package(package_id);
925 if (cpu < 0)
926 return;
927
928 /* irq enable state not saved, nothing to restore */
929 if (!(rp->power_limit_irq & PACKAGE_PLN_INT_SAVED))
930 return;
931 rdmsr_safe_on_cpu(cpu, MSR_IA32_PACKAGE_THERM_INTERRUPT, &l, &h);
932
933 if (rp->power_limit_irq & PACKAGE_THERM_INT_PLN_ENABLE)
934 l |= PACKAGE_THERM_INT_PLN_ENABLE;
935 else
936 l &= ~PACKAGE_THERM_INT_PLN_ENABLE;
937
938 wrmsr_on_cpu(cpu, MSR_IA32_PACKAGE_THERM_INTERRUPT, l, h);
939}
940
941static const struct x86_cpu_id rapl_ids[] = {
942 { X86_VENDOR_INTEL, 6, 0x2a},/* SNB */
943 { X86_VENDOR_INTEL, 6, 0x2d},/* SNB EP */
944 { X86_VENDOR_INTEL, 6, 0x3a},/* IVB */
945 { X86_VENDOR_INTEL, 6, 0x45},/* HSW */
946 /* TODO: Add more CPU IDs after testing */
947 {}
948};
949MODULE_DEVICE_TABLE(x86cpu, rapl_ids);
950
951/* read once for all raw primitive data for all packages, domains */
952static void rapl_update_domain_data(void)
953{
954 int dmn, prim;
955 u64 val;
956 struct rapl_package *rp;
957
958 list_for_each_entry(rp, &rapl_packages, plist) {
959 for (dmn = 0; dmn < rp->nr_domains; dmn++) {
960 pr_debug("update package %d domain %s data\n", rp->id,
961 rp->domains[dmn].name);
962 /* exclude non-raw primitives */
963 for (prim = 0; prim < NR_RAW_PRIMITIVES; prim++)
964 if (!rapl_read_data_raw(&rp->domains[dmn], prim,
965 rpi[prim].unit,
966 &val))
967 rp->domains[dmn].rdd.primitives[prim] =
968 val;
969 }
970 }
971
972}
973
974static int rapl_unregister_powercap(void)
975{
976 struct rapl_package *rp;
977 struct rapl_domain *rd, *rd_package = NULL;
978
979 /* unregister all active rapl packages from the powercap layer,
980 * hotplug lock held
981 */
982 list_for_each_entry(rp, &rapl_packages, plist) {
983 package_power_limit_irq_restore(rp->id);
984
985 for (rd = rp->domains; rd < rp->domains + rp->nr_domains;
986 rd++) {
987 pr_debug("remove package, undo power limit on %d: %s\n",
988 rp->id, rd->name);
989 rapl_write_data_raw(rd, PL1_ENABLE, 0);
990 rapl_write_data_raw(rd, PL2_ENABLE, 0);
991 rapl_write_data_raw(rd, PL1_CLAMP, 0);
992 rapl_write_data_raw(rd, PL2_CLAMP, 0);
993 if (rd->id == RAPL_DOMAIN_PACKAGE) {
994 rd_package = rd;
995 continue;
996 }
997 powercap_unregister_zone(control_type, &rd->power_zone);
998 }
999 /* do the package zone last */
1000 if (rd_package)
1001 powercap_unregister_zone(control_type,
1002 &rd_package->power_zone);
1003 }
1004 powercap_unregister_control_type(control_type);
1005
1006 return 0;
1007}
1008
1009static int rapl_package_register_powercap(struct rapl_package *rp)
1010{
1011 struct rapl_domain *rd;
1012 int ret = 0;
1013 char dev_name[17]; /* max domain name = 7 + 1 + 8 for int + 1 for null*/
1014 struct powercap_zone *power_zone = NULL;
1015 int nr_pl;
1016
1017 /* first we register package domain as the parent zone*/
1018 for (rd = rp->domains; rd < rp->domains + rp->nr_domains; rd++) {
1019 if (rd->id == RAPL_DOMAIN_PACKAGE) {
1020 nr_pl = find_nr_power_limit(rd);
1021 pr_debug("register socket %d package domain %s\n",
1022 rp->id, rd->name);
1023 memset(dev_name, 0, sizeof(dev_name));
1024 snprintf(dev_name, sizeof(dev_name), "%s-%d",
1025 rd->name, rp->id);
1026 power_zone = powercap_register_zone(&rd->power_zone,
1027 control_type,
1028 dev_name, NULL,
1029 &zone_ops[rd->id],
1030 nr_pl,
1031 &constraint_ops);
1032 if (IS_ERR(power_zone)) {
1033 pr_debug("failed to register package, %d\n",
1034 rp->id);
1035 ret = PTR_ERR(power_zone);
1036 goto exit_package;
1037 }
1038 /* track parent zone in per package/socket data */
1039 rp->power_zone = power_zone;
1040 /* done, only one package domain per socket */
1041 break;
1042 }
1043 }
1044 if (!power_zone) {
1045 pr_err("no package domain found, unknown topology!\n");
1046 ret = -ENODEV;
1047 goto exit_package;
1048 }
1049 /* now register domains as children of the socket/package*/
1050 for (rd = rp->domains; rd < rp->domains + rp->nr_domains; rd++) {
1051 if (rd->id == RAPL_DOMAIN_PACKAGE)
1052 continue;
1053 /* number of power limits per domain varies */
1054 nr_pl = find_nr_power_limit(rd);
1055 power_zone = powercap_register_zone(&rd->power_zone,
1056 control_type, rd->name,
1057 rp->power_zone,
1058 &zone_ops[rd->id], nr_pl,
1059 &constraint_ops);
1060
1061 if (IS_ERR(power_zone)) {
1062 pr_debug("failed to register power_zone, %d:%s:%s\n",
1063 rp->id, rd->name, dev_name);
1064 ret = PTR_ERR(power_zone);
1065 goto err_cleanup;
1066 }
1067 }
1068
1069exit_package:
1070 return ret;
1071err_cleanup:
1072 /* clean up previously initialized domains within the package if we
1073 * failed after the first domain setup.
1074 */
1075 while (--rd >= rp->domains) {
1076 pr_debug("unregister package %d domain %s\n", rp->id, rd->name);
1077 powercap_unregister_zone(control_type, &rd->power_zone);
1078 }
1079
1080 return ret;
1081}
1082
1083static int rapl_register_powercap(void)
1084{
1085 struct rapl_domain *rd;
1086 struct rapl_package *rp;
1087 int ret = 0;
1088
1089 control_type = powercap_register_control_type(NULL, "intel-rapl", NULL);
1090 if (IS_ERR(control_type)) {
1091 pr_debug("failed to register powercap control_type.\n");
1092 return PTR_ERR(control_type);
1093 }
1094 /* read the initial data */
1095 rapl_update_domain_data();
1096 list_for_each_entry(rp, &rapl_packages, plist)
1097 if (rapl_package_register_powercap(rp))
1098 goto err_cleanup_package;
1099 return ret;
1100
1101err_cleanup_package:
1102 /* clean up previously initialized packages */
1103 list_for_each_entry_continue_reverse(rp, &rapl_packages, plist) {
1104 for (rd = rp->domains; rd < rp->domains + rp->nr_domains;
1105 rd++) {
1106 pr_debug("unregister zone/package %d, %s domain\n",
1107 rp->id, rd->name);
1108 powercap_unregister_zone(control_type, &rd->power_zone);
1109 }
1110 }
1111
1112 return ret;
1113}
1114
1115static int rapl_check_domain(int cpu, int domain)
1116{
1117 unsigned msr;
1118 u64 val1, val2 = 0;
1119 int retry = 0;
1120
1121 switch (domain) {
1122 case RAPL_DOMAIN_PACKAGE:
1123 msr = MSR_PKG_ENERGY_STATUS;
1124 break;
1125 case RAPL_DOMAIN_PP0:
1126 msr = MSR_PP0_ENERGY_STATUS;
1127 break;
1128 case RAPL_DOMAIN_PP1:
1129 msr = MSR_PP1_ENERGY_STATUS;
1130 break;
1131 case RAPL_DOMAIN_DRAM:
1132 msr = MSR_DRAM_ENERGY_STATUS;
1133 break;
1134 default:
1135 pr_err("invalid domain id %d\n", domain);
1136 return -EINVAL;
1137 }
1138 if (rdmsrl_safe_on_cpu(cpu, msr, &val1))
1139 return -ENODEV;
1140
1141 /* energy counters roll slowly on some domains */
1142 while (++retry < 10) {
1143 usleep_range(10000, 15000);
1144 rdmsrl_safe_on_cpu(cpu, msr, &val2);
1145 if ((val1 & ENERGY_STATUS_MASK) != (val2 & ENERGY_STATUS_MASK))
1146 return 0;
1147 }
1148 /* if energy counter does not change, report as bad domain */
1149 pr_info("domain %s energy ctr %llu:%llu not working, skip\n",
1150 rapl_domain_names[domain], val1, val2);
1151
1152 return -ENODEV;
1153}
1154
1155/* Detect active and valid domains for the given CPU, caller must
1156 * ensure the CPU belongs to the targeted package and CPU hotlug is disabled.
1157 */
1158static int rapl_detect_domains(struct rapl_package *rp, int cpu)
1159{
1160 int i;
1161 int ret = 0;
1162 struct rapl_domain *rd;
1163 u64 locked;
1164
1165 for (i = 0; i < RAPL_DOMAIN_MAX; i++) {
1166 /* use physical package id to read counters */
1167 if (!rapl_check_domain(cpu, i))
1168 rp->domain_map |= 1 << i;
1169 }
1170 rp->nr_domains = bitmap_weight(&rp->domain_map, RAPL_DOMAIN_MAX);
1171 if (!rp->nr_domains) {
1172 pr_err("no valid rapl domains found in package %d\n", rp->id);
1173 ret = -ENODEV;
1174 goto done;
1175 }
1176 pr_debug("found %d domains on package %d\n", rp->nr_domains, rp->id);
1177
1178 rp->domains = kcalloc(rp->nr_domains + 1, sizeof(struct rapl_domain),
1179 GFP_KERNEL);
1180 if (!rp->domains) {
1181 ret = -ENOMEM;
1182 goto done;
1183 }
1184 rapl_init_domains(rp);
1185
1186 for (rd = rp->domains; rd < rp->domains + rp->nr_domains; rd++) {
1187 /* check if the domain is locked by BIOS */
1188 if (rapl_read_data_raw(rd, FW_LOCK, false, &locked)) {
1189 pr_info("RAPL package %d domain %s locked by BIOS\n",
1190 rp->id, rd->name);
1191 rd->state |= DOMAIN_STATE_BIOS_LOCKED;
1192 }
1193 }
1194
1195
1196done:
1197 return ret;
1198}
1199
1200static bool is_package_new(int package)
1201{
1202 struct rapl_package *rp;
1203
1204 /* caller prevents cpu hotplug, there will be no new packages added
1205 * or deleted while traversing the package list, no need for locking.
1206 */
1207 list_for_each_entry(rp, &rapl_packages, plist)
1208 if (package == rp->id)
1209 return false;
1210
1211 return true;
1212}
1213
1214/* RAPL interface can be made of a two-level hierarchy: package level and domain
1215 * level. We first detect the number of packages then domains of each package.
1216 * We have to consider the possiblity of CPU online/offline due to hotplug and
1217 * other scenarios.
1218 */
1219static int rapl_detect_topology(void)
1220{
1221 int i;
1222 int phy_package_id;
1223 struct rapl_package *new_package, *rp;
1224
1225 for_each_online_cpu(i) {
1226 phy_package_id = topology_physical_package_id(i);
1227 if (is_package_new(phy_package_id)) {
1228 new_package = kzalloc(sizeof(*rp), GFP_KERNEL);
1229 if (!new_package) {
1230 rapl_cleanup_data();
1231 return -ENOMEM;
1232 }
1233 /* add the new package to the list */
1234 new_package->id = phy_package_id;
1235 new_package->nr_cpus = 1;
1236
1237 /* check if the package contains valid domains */
1238 if (rapl_detect_domains(new_package, i) ||
1239 rapl_check_unit(new_package, i)) {
1240 kfree(new_package->domains);
1241 kfree(new_package);
1242 /* free up the packages already initialized */
1243 rapl_cleanup_data();
1244 return -ENODEV;
1245 }
1246 INIT_LIST_HEAD(&new_package->plist);
1247 list_add(&new_package->plist, &rapl_packages);
1248 } else {
1249 rp = find_package_by_id(phy_package_id);
1250 if (rp)
1251 ++rp->nr_cpus;
1252 }
1253 }
1254
1255 return 0;
1256}
1257
1258/* called from CPU hotplug notifier, hotplug lock held */
1259static void rapl_remove_package(struct rapl_package *rp)
1260{
1261 struct rapl_domain *rd, *rd_package = NULL;
1262
1263 for (rd = rp->domains; rd < rp->domains + rp->nr_domains; rd++) {
1264 if (rd->id == RAPL_DOMAIN_PACKAGE) {
1265 rd_package = rd;
1266 continue;
1267 }
1268 pr_debug("remove package %d, %s domain\n", rp->id, rd->name);
1269 powercap_unregister_zone(control_type, &rd->power_zone);
1270 }
1271 /* do parent zone last */
1272 powercap_unregister_zone(control_type, &rd_package->power_zone);
1273 list_del(&rp->plist);
1274 kfree(rp);
1275}
1276
1277/* called from CPU hotplug notifier, hotplug lock held */
1278static int rapl_add_package(int cpu)
1279{
1280 int ret = 0;
1281 int phy_package_id;
1282 struct rapl_package *rp;
1283
1284 phy_package_id = topology_physical_package_id(cpu);
1285 rp = kzalloc(sizeof(struct rapl_package), GFP_KERNEL);
1286 if (!rp)
1287 return -ENOMEM;
1288
1289 /* add the new package to the list */
1290 rp->id = phy_package_id;
1291 rp->nr_cpus = 1;
1292 /* check if the package contains valid domains */
1293 if (rapl_detect_domains(rp, cpu) ||
1294 rapl_check_unit(rp, cpu)) {
1295 ret = -ENODEV;
1296 goto err_free_package;
1297 }
1298 if (!rapl_package_register_powercap(rp)) {
1299 INIT_LIST_HEAD(&rp->plist);
1300 list_add(&rp->plist, &rapl_packages);
1301 return ret;
1302 }
1303
1304err_free_package:
1305 kfree(rp->domains);
1306 kfree(rp);
1307
1308 return ret;
1309}
1310
1311/* Handles CPU hotplug on multi-socket systems.
1312 * If a CPU goes online as the first CPU of the physical package
1313 * we add the RAPL package to the system. Similarly, when the last
1314 * CPU of the package is removed, we remove the RAPL package and its
1315 * associated domains. Cooling devices are handled accordingly at
1316 * per-domain level.
1317 */
1318static int rapl_cpu_callback(struct notifier_block *nfb,
1319 unsigned long action, void *hcpu)
1320{
1321 unsigned long cpu = (unsigned long)hcpu;
1322 int phy_package_id;
1323 struct rapl_package *rp;
1324
1325 phy_package_id = topology_physical_package_id(cpu);
1326 switch (action) {
1327 case CPU_ONLINE:
1328 case CPU_ONLINE_FROZEN:
1329 case CPU_DOWN_FAILED:
1330 case CPU_DOWN_FAILED_FROZEN:
1331 rp = find_package_by_id(phy_package_id);
1332 if (rp)
1333 ++rp->nr_cpus;
1334 else
1335 rapl_add_package(cpu);
1336 break;
1337 case CPU_DOWN_PREPARE:
1338 case CPU_DOWN_PREPARE_FROZEN:
1339 rp = find_package_by_id(phy_package_id);
1340 if (!rp)
1341 break;
1342 if (--rp->nr_cpus == 0)
1343 rapl_remove_package(rp);
1344 }
1345
1346 return NOTIFY_OK;
1347}
1348
1349static struct notifier_block rapl_cpu_notifier = {
1350 .notifier_call = rapl_cpu_callback,
1351};
1352
1353static int __init rapl_init(void)
1354{
1355 int ret = 0;
1356
1357 if (!x86_match_cpu(rapl_ids)) {
1358 pr_err("driver does not support CPU family %d model %d\n",
1359 boot_cpu_data.x86, boot_cpu_data.x86_model);
1360
1361 return -ENODEV;
1362 }
1363 /* prevent CPU hotplug during detection */
1364 get_online_cpus();
1365 ret = rapl_detect_topology();
1366 if (ret)
1367 goto done;
1368
1369 if (rapl_register_powercap()) {
1370 rapl_cleanup_data();
1371 ret = -ENODEV;
1372 goto done;
1373 }
1374 register_hotcpu_notifier(&rapl_cpu_notifier);
1375done:
1376 put_online_cpus();
1377
1378 return ret;
1379}
1380
1381static void __exit rapl_exit(void)
1382{
1383 get_online_cpus();
1384 unregister_hotcpu_notifier(&rapl_cpu_notifier);
1385 rapl_unregister_powercap();
1386 rapl_cleanup_data();
1387 put_online_cpus();
1388}
1389
1390module_init(rapl_init);
1391module_exit(rapl_exit);
1392
1393MODULE_DESCRIPTION("Driver for Intel RAPL (Running Average Power Limit)");
1394MODULE_AUTHOR("Jacob Pan <jacob.jun.pan@intel.com>");
1395MODULE_LICENSE("GPL v2");
diff --git a/drivers/powercap/powercap_sys.c b/drivers/powercap/powercap_sys.c
new file mode 100644
index 000000000000..21814f90a44b
--- /dev/null
+++ b/drivers/powercap/powercap_sys.c
@@ -0,0 +1,685 @@
1/*
2 * Power capping class
3 * Copyright (c) 2013, Intel Corporation.
4 *
5 * This program is free software; you can redistribute it and/or modify it
6 * under the terms and conditions of the GNU General Public License,
7 * version 2, as published by the Free Software Foundation.
8 *
9 * This program is distributed in the hope it will be useful, but WITHOUT
10 * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
11 * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
12 * more details.
13 *
14 * You should have received a copy of the GNU General Public License along with
15 * this program; if not, write to the Free Software Foundation, Inc.
16 *
17 */
18
19#include <linux/module.h>
20#include <linux/device.h>
21#include <linux/err.h>
22#include <linux/slab.h>
23#include <linux/powercap.h>
24
25#define to_powercap_zone(n) container_of(n, struct powercap_zone, dev)
26#define to_powercap_control_type(n) \
27 container_of(n, struct powercap_control_type, dev)
28
29/* Power zone show function */
30#define define_power_zone_show(_attr) \
31static ssize_t _attr##_show(struct device *dev, \
32 struct device_attribute *dev_attr,\
33 char *buf) \
34{ \
35 u64 value; \
36 ssize_t len = -EINVAL; \
37 struct powercap_zone *power_zone = to_powercap_zone(dev); \
38 \
39 if (power_zone->ops->get_##_attr) { \
40 if (!power_zone->ops->get_##_attr(power_zone, &value)) \
41 len = sprintf(buf, "%lld\n", value); \
42 } \
43 \
44 return len; \
45}
46
47/* The only meaningful input is 0 (reset), others are silently ignored */
48#define define_power_zone_store(_attr) \
49static ssize_t _attr##_store(struct device *dev,\
50 struct device_attribute *dev_attr, \
51 const char *buf, size_t count) \
52{ \
53 int err; \
54 struct powercap_zone *power_zone = to_powercap_zone(dev); \
55 u64 value; \
56 \
57 err = kstrtoull(buf, 10, &value); \
58 if (err) \
59 return -EINVAL; \
60 if (value) \
61 return count; \
62 if (power_zone->ops->reset_##_attr) { \
63 if (!power_zone->ops->reset_##_attr(power_zone)) \
64 return count; \
65 } \
66 \
67 return -EINVAL; \
68}
69
70/* Power zone constraint show function */
71#define define_power_zone_constraint_show(_attr) \
72static ssize_t show_constraint_##_attr(struct device *dev, \
73 struct device_attribute *dev_attr,\
74 char *buf) \
75{ \
76 u64 value; \
77 ssize_t len = -ENODATA; \
78 struct powercap_zone *power_zone = to_powercap_zone(dev); \
79 int id; \
80 struct powercap_zone_constraint *pconst;\
81 \
82 if (!sscanf(dev_attr->attr.name, "constraint_%d_", &id)) \
83 return -EINVAL; \
84 if (id >= power_zone->const_id_cnt) \
85 return -EINVAL; \
86 pconst = &power_zone->constraints[id]; \
87 if (pconst && pconst->ops && pconst->ops->get_##_attr) { \
88 if (!pconst->ops->get_##_attr(power_zone, id, &value)) \
89 len = sprintf(buf, "%lld\n", value); \
90 } \
91 \
92 return len; \
93}
94
95/* Power zone constraint store function */
96#define define_power_zone_constraint_store(_attr) \
97static ssize_t store_constraint_##_attr(struct device *dev,\
98 struct device_attribute *dev_attr, \
99 const char *buf, size_t count) \
100{ \
101 int err; \
102 u64 value; \
103 struct powercap_zone *power_zone = to_powercap_zone(dev); \
104 int id; \
105 struct powercap_zone_constraint *pconst;\
106 \
107 if (!sscanf(dev_attr->attr.name, "constraint_%d_", &id)) \
108 return -EINVAL; \
109 if (id >= power_zone->const_id_cnt) \
110 return -EINVAL; \
111 pconst = &power_zone->constraints[id]; \
112 err = kstrtoull(buf, 10, &value); \
113 if (err) \
114 return -EINVAL; \
115 if (pconst && pconst->ops && pconst->ops->set_##_attr) { \
116 if (!pconst->ops->set_##_attr(power_zone, id, value)) \
117 return count; \
118 } \
119 \
120 return -ENODATA; \
121}
122
123/* Power zone information callbacks */
124define_power_zone_show(power_uw);
125define_power_zone_show(max_power_range_uw);
126define_power_zone_show(energy_uj);
127define_power_zone_store(energy_uj);
128define_power_zone_show(max_energy_range_uj);
129
130/* Power zone attributes */
131static DEVICE_ATTR_RO(max_power_range_uw);
132static DEVICE_ATTR_RO(power_uw);
133static DEVICE_ATTR_RO(max_energy_range_uj);
134static DEVICE_ATTR_RW(energy_uj);
135
136/* Power zone constraint attributes callbacks */
137define_power_zone_constraint_show(power_limit_uw);
138define_power_zone_constraint_store(power_limit_uw);
139define_power_zone_constraint_show(time_window_us);
140define_power_zone_constraint_store(time_window_us);
141define_power_zone_constraint_show(max_power_uw);
142define_power_zone_constraint_show(min_power_uw);
143define_power_zone_constraint_show(max_time_window_us);
144define_power_zone_constraint_show(min_time_window_us);
145
146/* For one time seeding of constraint device attributes */
147struct powercap_constraint_attr {
148 struct device_attribute power_limit_attr;
149 struct device_attribute time_window_attr;
150 struct device_attribute max_power_attr;
151 struct device_attribute min_power_attr;
152 struct device_attribute max_time_window_attr;
153 struct device_attribute min_time_window_attr;
154 struct device_attribute name_attr;
155};
156
157static struct powercap_constraint_attr
158 constraint_attrs[MAX_CONSTRAINTS_PER_ZONE];
159
160/* A list of powercap control_types */
161static LIST_HEAD(powercap_cntrl_list);
162/* Mutex to protect list of powercap control_types */
163static DEFINE_MUTEX(powercap_cntrl_list_lock);
164
165#define POWERCAP_CONSTRAINT_NAME_LEN 30 /* Some limit to avoid overflow */
166static ssize_t show_constraint_name(struct device *dev,
167 struct device_attribute *dev_attr,
168 char *buf)
169{
170 const char *name;
171 struct powercap_zone *power_zone = to_powercap_zone(dev);
172 int id;
173 ssize_t len = -ENODATA;
174 struct powercap_zone_constraint *pconst;
175
176 if (!sscanf(dev_attr->attr.name, "constraint_%d_", &id))
177 return -EINVAL;
178 if (id >= power_zone->const_id_cnt)
179 return -EINVAL;
180 pconst = &power_zone->constraints[id];
181
182 if (pconst && pconst->ops && pconst->ops->get_name) {
183 name = pconst->ops->get_name(power_zone, id);
184 if (name) {
185 snprintf(buf, POWERCAP_CONSTRAINT_NAME_LEN,
186 "%s\n", name);
187 buf[POWERCAP_CONSTRAINT_NAME_LEN] = '\0';
188 len = strlen(buf);
189 }
190 }
191
192 return len;
193}
194
195static int create_constraint_attribute(int id, const char *name,
196 int mode,
197 struct device_attribute *dev_attr,
198 ssize_t (*show)(struct device *,
199 struct device_attribute *, char *),
200 ssize_t (*store)(struct device *,
201 struct device_attribute *,
202 const char *, size_t)
203 )
204{
205
206 dev_attr->attr.name = kasprintf(GFP_KERNEL, "constraint_%d_%s",
207 id, name);
208 if (!dev_attr->attr.name)
209 return -ENOMEM;
210 dev_attr->attr.mode = mode;
211 dev_attr->show = show;
212 dev_attr->store = store;
213
214 return 0;
215}
216
217static void free_constraint_attributes(void)
218{
219 int i;
220
221 for (i = 0; i < MAX_CONSTRAINTS_PER_ZONE; ++i) {
222 kfree(constraint_attrs[i].power_limit_attr.attr.name);
223 kfree(constraint_attrs[i].time_window_attr.attr.name);
224 kfree(constraint_attrs[i].name_attr.attr.name);
225 kfree(constraint_attrs[i].max_power_attr.attr.name);
226 kfree(constraint_attrs[i].min_power_attr.attr.name);
227 kfree(constraint_attrs[i].max_time_window_attr.attr.name);
228 kfree(constraint_attrs[i].min_time_window_attr.attr.name);
229 }
230}
231
232static int seed_constraint_attributes(void)
233{
234 int i;
235 int ret;
236
237 for (i = 0; i < MAX_CONSTRAINTS_PER_ZONE; ++i) {
238 ret = create_constraint_attribute(i, "power_limit_uw",
239 S_IWUSR | S_IRUGO,
240 &constraint_attrs[i].power_limit_attr,
241 show_constraint_power_limit_uw,
242 store_constraint_power_limit_uw);
243 if (ret)
244 goto err_alloc;
245 ret = create_constraint_attribute(i, "time_window_us",
246 S_IWUSR | S_IRUGO,
247 &constraint_attrs[i].time_window_attr,
248 show_constraint_time_window_us,
249 store_constraint_time_window_us);
250 if (ret)
251 goto err_alloc;
252 ret = create_constraint_attribute(i, "name", S_IRUGO,
253 &constraint_attrs[i].name_attr,
254 show_constraint_name,
255 NULL);
256 if (ret)
257 goto err_alloc;
258 ret = create_constraint_attribute(i, "max_power_uw", S_IRUGO,
259 &constraint_attrs[i].max_power_attr,
260 show_constraint_max_power_uw,
261 NULL);
262 if (ret)
263 goto err_alloc;
264 ret = create_constraint_attribute(i, "min_power_uw", S_IRUGO,
265 &constraint_attrs[i].min_power_attr,
266 show_constraint_min_power_uw,
267 NULL);
268 if (ret)
269 goto err_alloc;
270 ret = create_constraint_attribute(i, "max_time_window_us",
271 S_IRUGO,
272 &constraint_attrs[i].max_time_window_attr,
273 show_constraint_max_time_window_us,
274 NULL);
275 if (ret)
276 goto err_alloc;
277 ret = create_constraint_attribute(i, "min_time_window_us",
278 S_IRUGO,
279 &constraint_attrs[i].min_time_window_attr,
280 show_constraint_min_time_window_us,
281 NULL);
282 if (ret)
283 goto err_alloc;
284
285 }
286
287 return 0;
288
289err_alloc:
290 free_constraint_attributes();
291
292 return ret;
293}
294
295static int create_constraints(struct powercap_zone *power_zone,
296 int nr_constraints,
297 struct powercap_zone_constraint_ops *const_ops)
298{
299 int i;
300 int ret = 0;
301 int count;
302 struct powercap_zone_constraint *pconst;
303
304 if (!power_zone || !const_ops || !const_ops->get_power_limit_uw ||
305 !const_ops->set_power_limit_uw ||
306 !const_ops->get_time_window_us ||
307 !const_ops->set_time_window_us)
308 return -EINVAL;
309
310 count = power_zone->zone_attr_count;
311 for (i = 0; i < nr_constraints; ++i) {
312 pconst = &power_zone->constraints[i];
313 pconst->ops = const_ops;
314 pconst->id = power_zone->const_id_cnt;
315 power_zone->const_id_cnt++;
316 power_zone->zone_dev_attrs[count++] =
317 &constraint_attrs[i].power_limit_attr.attr;
318 power_zone->zone_dev_attrs[count++] =
319 &constraint_attrs[i].time_window_attr.attr;
320 if (pconst->ops->get_name)
321 power_zone->zone_dev_attrs[count++] =
322 &constraint_attrs[i].name_attr.attr;
323 if (pconst->ops->get_max_power_uw)
324 power_zone->zone_dev_attrs[count++] =
325 &constraint_attrs[i].max_power_attr.attr;
326 if (pconst->ops->get_min_power_uw)
327 power_zone->zone_dev_attrs[count++] =
328 &constraint_attrs[i].min_power_attr.attr;
329 if (pconst->ops->get_max_time_window_us)
330 power_zone->zone_dev_attrs[count++] =
331 &constraint_attrs[i].max_time_window_attr.attr;
332 if (pconst->ops->get_min_time_window_us)
333 power_zone->zone_dev_attrs[count++] =
334 &constraint_attrs[i].min_time_window_attr.attr;
335 }
336 power_zone->zone_attr_count = count;
337
338 return ret;
339}
340
341static bool control_type_valid(void *control_type)
342{
343 struct powercap_control_type *pos = NULL;
344 bool found = false;
345
346 mutex_lock(&powercap_cntrl_list_lock);
347
348 list_for_each_entry(pos, &powercap_cntrl_list, node) {
349 if (pos == control_type) {
350 found = true;
351 break;
352 }
353 }
354 mutex_unlock(&powercap_cntrl_list_lock);
355
356 return found;
357}
358
359static ssize_t name_show(struct device *dev,
360 struct device_attribute *attr,
361 char *buf)
362{
363 struct powercap_zone *power_zone = to_powercap_zone(dev);
364
365 return sprintf(buf, "%s\n", power_zone->name);
366}
367
368static DEVICE_ATTR_RO(name);
369
370/* Create zone and attributes in sysfs */
371static void create_power_zone_common_attributes(
372 struct powercap_zone *power_zone)
373{
374 int count = 0;
375
376 power_zone->zone_dev_attrs[count++] = &dev_attr_name.attr;
377 if (power_zone->ops->get_max_energy_range_uj)
378 power_zone->zone_dev_attrs[count++] =
379 &dev_attr_max_energy_range_uj.attr;
380 if (power_zone->ops->get_energy_uj)
381 power_zone->zone_dev_attrs[count++] =
382 &dev_attr_energy_uj.attr;
383 if (power_zone->ops->get_power_uw)
384 power_zone->zone_dev_attrs[count++] =
385 &dev_attr_power_uw.attr;
386 if (power_zone->ops->get_max_power_range_uw)
387 power_zone->zone_dev_attrs[count++] =
388 &dev_attr_max_power_range_uw.attr;
389 power_zone->zone_dev_attrs[count] = NULL;
390 power_zone->zone_attr_count = count;
391}
392
393static void powercap_release(struct device *dev)
394{
395 bool allocated;
396
397 if (dev->parent) {
398 struct powercap_zone *power_zone = to_powercap_zone(dev);
399
400 /* Store flag as the release() may free memory */
401 allocated = power_zone->allocated;
402 /* Remove id from parent idr struct */
403 idr_remove(power_zone->parent_idr, power_zone->id);
404 /* Destroy idrs allocated for this zone */
405 idr_destroy(&power_zone->idr);
406 kfree(power_zone->name);
407 kfree(power_zone->zone_dev_attrs);
408 kfree(power_zone->constraints);
409 if (power_zone->ops->release)
410 power_zone->ops->release(power_zone);
411 if (allocated)
412 kfree(power_zone);
413 } else {
414 struct powercap_control_type *control_type =
415 to_powercap_control_type(dev);
416
417 /* Store flag as the release() may free memory */
418 allocated = control_type->allocated;
419 idr_destroy(&control_type->idr);
420 mutex_destroy(&control_type->lock);
421 if (control_type->ops && control_type->ops->release)
422 control_type->ops->release(control_type);
423 if (allocated)
424 kfree(control_type);
425 }
426}
427
428static ssize_t enabled_show(struct device *dev,
429 struct device_attribute *attr,
430 char *buf)
431{
432 bool mode = true;
433
434 /* Default is enabled */
435 if (dev->parent) {
436 struct powercap_zone *power_zone = to_powercap_zone(dev);
437 if (power_zone->ops->get_enable)
438 if (power_zone->ops->get_enable(power_zone, &mode))
439 mode = false;
440 } else {
441 struct powercap_control_type *control_type =
442 to_powercap_control_type(dev);
443 if (control_type->ops && control_type->ops->get_enable)
444 if (control_type->ops->get_enable(control_type, &mode))
445 mode = false;
446 }
447
448 return sprintf(buf, "%d\n", mode);
449}
450
451static ssize_t enabled_store(struct device *dev,
452 struct device_attribute *attr,
453 const char *buf, size_t len)
454{
455 bool mode;
456
457 if (strtobool(buf, &mode))
458 return -EINVAL;
459 if (dev->parent) {
460 struct powercap_zone *power_zone = to_powercap_zone(dev);
461 if (power_zone->ops->set_enable)
462 if (!power_zone->ops->set_enable(power_zone, mode))
463 return len;
464 } else {
465 struct powercap_control_type *control_type =
466 to_powercap_control_type(dev);
467 if (control_type->ops && control_type->ops->set_enable)
468 if (!control_type->ops->set_enable(control_type, mode))
469 return len;
470 }
471
472 return -ENOSYS;
473}
474
475static DEVICE_ATTR_RW(enabled);
476
477static struct attribute *powercap_attrs[] = {
478 &dev_attr_enabled.attr,
479 NULL,
480};
481ATTRIBUTE_GROUPS(powercap);
482
483static struct class powercap_class = {
484 .name = "powercap",
485 .dev_release = powercap_release,
486 .dev_groups = powercap_groups,
487};
488
489struct powercap_zone *powercap_register_zone(
490 struct powercap_zone *power_zone,
491 struct powercap_control_type *control_type,
492 const char *name,
493 struct powercap_zone *parent,
494 const struct powercap_zone_ops *ops,
495 int nr_constraints,
496 struct powercap_zone_constraint_ops *const_ops)
497{
498 int result;
499 int nr_attrs;
500
501 if (!name || !control_type || !ops ||
502 nr_constraints > MAX_CONSTRAINTS_PER_ZONE ||
503 (!ops->get_energy_uj && !ops->get_power_uw) ||
504 !control_type_valid(control_type))
505 return ERR_PTR(-EINVAL);
506
507 if (power_zone) {
508 if (!ops->release)
509 return ERR_PTR(-EINVAL);
510 memset(power_zone, 0, sizeof(*power_zone));
511 } else {
512 power_zone = kzalloc(sizeof(*power_zone), GFP_KERNEL);
513 if (!power_zone)
514 return ERR_PTR(-ENOMEM);
515 power_zone->allocated = true;
516 }
517 power_zone->ops = ops;
518 power_zone->control_type_inst = control_type;
519 if (!parent) {
520 power_zone->dev.parent = &control_type->dev;
521 power_zone->parent_idr = &control_type->idr;
522 } else {
523 power_zone->dev.parent = &parent->dev;
524 power_zone->parent_idr = &parent->idr;
525 }
526 power_zone->dev.class = &powercap_class;
527
528 mutex_lock(&control_type->lock);
529 /* Using idr to get the unique id */
530 result = idr_alloc(power_zone->parent_idr, NULL, 0, 0, GFP_KERNEL);
531 if (result < 0)
532 goto err_idr_alloc;
533
534 power_zone->id = result;
535 idr_init(&power_zone->idr);
536 power_zone->name = kstrdup(name, GFP_KERNEL);
537 if (!power_zone->name)
538 goto err_name_alloc;
539 dev_set_name(&power_zone->dev, "%s:%x",
540 dev_name(power_zone->dev.parent),
541 power_zone->id);
542 power_zone->constraints = kzalloc(sizeof(*power_zone->constraints) *
543 nr_constraints, GFP_KERNEL);
544 if (!power_zone->constraints)
545 goto err_const_alloc;
546
547 nr_attrs = nr_constraints * POWERCAP_CONSTRAINTS_ATTRS +
548 POWERCAP_ZONE_MAX_ATTRS + 1;
549 power_zone->zone_dev_attrs = kzalloc(sizeof(void *) *
550 nr_attrs, GFP_KERNEL);
551 if (!power_zone->zone_dev_attrs)
552 goto err_attr_alloc;
553 create_power_zone_common_attributes(power_zone);
554 result = create_constraints(power_zone, nr_constraints, const_ops);
555 if (result)
556 goto err_dev_ret;
557
558 power_zone->zone_dev_attrs[power_zone->zone_attr_count] = NULL;
559 power_zone->dev_zone_attr_group.attrs = power_zone->zone_dev_attrs;
560 power_zone->dev_attr_groups[0] = &power_zone->dev_zone_attr_group;
561 power_zone->dev_attr_groups[1] = NULL;
562 power_zone->dev.groups = power_zone->dev_attr_groups;
563 result = device_register(&power_zone->dev);
564 if (result)
565 goto err_dev_ret;
566
567 control_type->nr_zones++;
568 mutex_unlock(&control_type->lock);
569
570 return power_zone;
571
572err_dev_ret:
573 kfree(power_zone->zone_dev_attrs);
574err_attr_alloc:
575 kfree(power_zone->constraints);
576err_const_alloc:
577 kfree(power_zone->name);
578err_name_alloc:
579 idr_remove(power_zone->parent_idr, power_zone->id);
580err_idr_alloc:
581 if (power_zone->allocated)
582 kfree(power_zone);
583 mutex_unlock(&control_type->lock);
584
585 return ERR_PTR(result);
586}
587EXPORT_SYMBOL_GPL(powercap_register_zone);
588
589int powercap_unregister_zone(struct powercap_control_type *control_type,
590 struct powercap_zone *power_zone)
591{
592 if (!power_zone || !control_type)
593 return -EINVAL;
594
595 mutex_lock(&control_type->lock);
596 control_type->nr_zones--;
597 mutex_unlock(&control_type->lock);
598
599 device_unregister(&power_zone->dev);
600
601 return 0;
602}
603EXPORT_SYMBOL_GPL(powercap_unregister_zone);
604
605struct powercap_control_type *powercap_register_control_type(
606 struct powercap_control_type *control_type,
607 const char *name,
608 const struct powercap_control_type_ops *ops)
609{
610 int result;
611
612 if (!name)
613 return ERR_PTR(-EINVAL);
614 if (control_type) {
615 if (!ops || !ops->release)
616 return ERR_PTR(-EINVAL);
617 memset(control_type, 0, sizeof(*control_type));
618 } else {
619 control_type = kzalloc(sizeof(*control_type), GFP_KERNEL);
620 if (!control_type)
621 return ERR_PTR(-ENOMEM);
622 control_type->allocated = true;
623 }
624 mutex_init(&control_type->lock);
625 control_type->ops = ops;
626 INIT_LIST_HEAD(&control_type->node);
627 control_type->dev.class = &powercap_class;
628 dev_set_name(&control_type->dev, name);
629 result = device_register(&control_type->dev);
630 if (result) {
631 if (control_type->allocated)
632 kfree(control_type);
633 return ERR_PTR(result);
634 }
635 idr_init(&control_type->idr);
636
637 mutex_lock(&powercap_cntrl_list_lock);
638 list_add_tail(&control_type->node, &powercap_cntrl_list);
639 mutex_unlock(&powercap_cntrl_list_lock);
640
641 return control_type;
642}
643EXPORT_SYMBOL_GPL(powercap_register_control_type);
644
645int powercap_unregister_control_type(struct powercap_control_type *control_type)
646{
647 struct powercap_control_type *pos = NULL;
648
649 if (control_type->nr_zones) {
650 dev_err(&control_type->dev, "Zones of this type still not freed\n");
651 return -EINVAL;
652 }
653 mutex_lock(&powercap_cntrl_list_lock);
654 list_for_each_entry(pos, &powercap_cntrl_list, node) {
655 if (pos == control_type) {
656 list_del(&control_type->node);
657 mutex_unlock(&powercap_cntrl_list_lock);
658 device_unregister(&control_type->dev);
659 return 0;
660 }
661 }
662 mutex_unlock(&powercap_cntrl_list_lock);
663
664 return -ENODEV;
665}
666EXPORT_SYMBOL_GPL(powercap_unregister_control_type);
667
668static int __init powercap_init(void)
669{
670 int result = 0;
671
672 result = seed_constraint_attributes();
673 if (result)
674 return result;
675
676 result = class_register(&powercap_class);
677
678 return result;
679}
680
681device_initcall(powercap_init);
682
683MODULE_DESCRIPTION("PowerCap sysfs Driver");
684MODULE_AUTHOR("Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>");
685MODULE_LICENSE("GPL v2");
diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index a3b6b82108b9..5a1c8b71ccd8 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -4,8 +4,11 @@
4 4
5#ifdef __KERNEL__ 5#ifdef __KERNEL__
6#define BIT(nr) (1UL << (nr)) 6#define BIT(nr) (1UL << (nr))
7#define BIT_ULL(nr) (1ULL << (nr))
7#define BIT_MASK(nr) (1UL << ((nr) % BITS_PER_LONG)) 8#define BIT_MASK(nr) (1UL << ((nr) % BITS_PER_LONG))
8#define BIT_WORD(nr) ((nr) / BITS_PER_LONG) 9#define BIT_WORD(nr) ((nr) / BITS_PER_LONG)
10#define BIT_ULL_MASK(nr) (1ULL << ((nr) % BITS_PER_LONG_LONG))
11#define BIT_ULL_WORD(nr) ((nr) / BITS_PER_LONG_LONG)
9#define BITS_PER_BYTE 8 12#define BITS_PER_BYTE 8
10#define BITS_TO_LONGS(nr) DIV_ROUND_UP(nr, BITS_PER_BYTE * sizeof(long)) 13#define BITS_TO_LONGS(nr) DIV_ROUND_UP(nr, BITS_PER_BYTE * sizeof(long))
11#endif 14#endif
diff --git a/include/linux/powercap.h b/include/linux/powercap.h
new file mode 100644
index 000000000000..4e250417ee30
--- /dev/null
+++ b/include/linux/powercap.h
@@ -0,0 +1,325 @@
1/*
2 * powercap.h: Data types and headers for sysfs power capping interface
3 * Copyright (c) 2013, Intel Corporation.
4 *
5 * This program is free software; you can redistribute it and/or modify it
6 * under the terms and conditions of the GNU General Public License,
7 * version 2, as published by the Free Software Foundation.
8 *
9 * This program is distributed in the hope it will be useful, but WITHOUT
10 * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
11 * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
12 * more details.
13 *
14 * You should have received a copy of the GNU General Public License along with
15 * this program; if not, write to the Free Software Foundation, Inc.
16 *
17 */
18
19#ifndef __POWERCAP_H__
20#define __POWERCAP_H__
21
22#include <linux/device.h>
23#include <linux/idr.h>
24
25/*
26 * A power cap class device can contain multiple powercap control_types.
27 * Each control_type can have multiple power zones, which can be independently
28 * controlled. Each power zone can have one or more constraints.
29 */
30
31struct powercap_control_type;
32struct powercap_zone;
33struct powercap_zone_constraint;
34
35/**
36 * struct powercap_control_type_ops - Define control type callbacks
37 * @set_enable: Enable/Disable whole control type.
38 * Default is enabled. But this callback allows all zones
39 * to be in disable state and remove any applied power
40 * limits. If disabled power zone can only be monitored
41 * not controlled.
42 * @get_enable: get Enable/Disable status.
43 * @release: Callback to inform that last reference to this
44 * control type is closed. So it is safe to free data
45 * structure associated with this control type.
46 * This callback is mandatory if the client own memory
47 * for the control type.
48 *
49 * This structure defines control type callbacks to be implemented by client
50 * drivers
51 */
52struct powercap_control_type_ops {
53 int (*set_enable) (struct powercap_control_type *, bool mode);
54 int (*get_enable) (struct powercap_control_type *, bool *mode);
55 int (*release) (struct powercap_control_type *);
56};
57
58/**
59 * struct powercap_control_type- Defines a powercap control_type
60 * @name: name of control_type
61 * @dev: device for this control_type
62 * @idr: idr to have unique id for its child
63 * @root_node: Root holding power zones for this control_type
64 * @ops: Pointer to callback struct
65 * @node_lock: mutex for control type
66 * @allocated: This is possible that client owns the memory
67 * used by this structure. In this case
68 * this flag is set to false by framework to
69 * prevent deallocation during release process.
70 * Otherwise this flag is set to true.
71 * @ctrl_inst: link to the control_type list
72 *
73 * Defines powercap control_type. This acts as a container for power
74 * zones, which use same method to control power. E.g. RAPL, RAPL-PCI etc.
75 * All fields are private and should not be used by client drivers.
76 */
77struct powercap_control_type {
78 struct device dev;
79 struct idr idr;
80 int nr_zones;
81 const struct powercap_control_type_ops *ops;
82 struct mutex lock;
83 bool allocated;
84 struct list_head node;
85};
86
87/**
88 * struct powercap_zone_ops - Define power zone callbacks
89 * @get_max_energy_range_uj: Get maximum range of energy counter in
90 * micro-joules.
91 * @get_energy_uj: Get current energy counter in micro-joules.
92 * @reset_energy_uj: Reset micro-joules energy counter.
93 * @get_max_power_range_uw: Get maximum range of power counter in
94 * micro-watts.
95 * @get_power_uw: Get current power counter in micro-watts.
96 * @set_enable: Enable/Disable power zone controls.
97 * Default is enabled.
98 * @get_enable: get Enable/Disable status.
99 * @release: Callback to inform that last reference to this
100 * control type is closed. So it is safe to free
101 * data structure associated with this
102 * control type. Mandatory, if client driver owns
103 * the power_zone memory.
104 *
105 * This structure defines zone callbacks to be implemented by client drivers.
106 * Client drives can define both energy and power related callbacks. But at
107 * the least one type (either power or energy) is mandatory. Client drivers
108 * should handle mutual exclusion, if required in callbacks.
109 */
110struct powercap_zone_ops {
111 int (*get_max_energy_range_uj) (struct powercap_zone *, u64 *);
112 int (*get_energy_uj) (struct powercap_zone *, u64 *);
113 int (*reset_energy_uj) (struct powercap_zone *);
114 int (*get_max_power_range_uw) (struct powercap_zone *, u64 *);
115 int (*get_power_uw) (struct powercap_zone *, u64 *);
116 int (*set_enable) (struct powercap_zone *, bool mode);
117 int (*get_enable) (struct powercap_zone *, bool *mode);
118 int (*release) (struct powercap_zone *);
119};
120
121#define POWERCAP_ZONE_MAX_ATTRS 6
122#define POWERCAP_CONSTRAINTS_ATTRS 8
123#define MAX_CONSTRAINTS_PER_ZONE 10
124/**
125 * struct powercap_zone- Defines instance of a power cap zone
126 * @id: Unique id
127 * @name: Power zone name.
128 * @control_type_inst: Control type instance for this zone.
129 * @ops: Pointer to the zone operation structure.
130 * @dev: Instance of a device.
131 * @const_id_cnt: Number of constraint defined.
132 * @idr: Instance to an idr entry for children zones.
133 * @parent_idr: To remove reference from the parent idr.
134 * @private_data: Private data pointer if any for this zone.
135 * @zone_dev_attrs: Attributes associated with this device.
136 * @zone_attr_count: Attribute count.
137 * @dev_zone_attr_group: Attribute group for attributes.
138 * @dev_attr_groups: Attribute group store to register with device.
139 * @allocated: This is possible that client owns the memory
140 * used by this structure. In this case
141 * this flag is set to false by framework to
142 * prevent deallocation during release process.
143 * Otherwise this flag is set to true.
144 * @constraint_ptr: List of constraints for this zone.
145 *
146 * This defines a power zone instance. The fields of this structure are
147 * private, and should not be used by client drivers.
148 */
149struct powercap_zone {
150 int id;
151 char *name;
152 void *control_type_inst;
153 const struct powercap_zone_ops *ops;
154 struct device dev;
155 int const_id_cnt;
156 struct idr idr;
157 struct idr *parent_idr;
158 void *private_data;
159 struct attribute **zone_dev_attrs;
160 int zone_attr_count;
161 struct attribute_group dev_zone_attr_group;
162 const struct attribute_group *dev_attr_groups[2]; /* 1 group + NULL */
163 bool allocated;
164 struct powercap_zone_constraint *constraints;
165};
166
167/**
168 * struct powercap_zone_constraint_ops - Define constraint callbacks
169 * @set_power_limit_uw: Set power limit in micro-watts.
170 * @get_power_limit_uw: Get power limit in micro-watts.
171 * @set_time_window_us: Set time window in micro-seconds.
172 * @get_time_window_us: Get time window in micro-seconds.
173 * @get_max_power_uw: Get max power allowed in micro-watts.
174 * @get_min_power_uw: Get min power allowed in micro-watts.
175 * @get_max_time_window_us: Get max time window allowed in micro-seconds.
176 * @get_min_time_window_us: Get min time window allowed in micro-seconds.
177 * @get_name: Get the name of constraint
178 *
179 * This structure is used to define the constraint callbacks for the client
180 * drivers. The following callbacks are mandatory and can't be NULL:
181 * set_power_limit_uw
182 * get_power_limit_uw
183 * set_time_window_us
184 * get_time_window_us
185 * get_name
186 * Client drivers should handle mutual exclusion, if required in callbacks.
187 */
188struct powercap_zone_constraint_ops {
189 int (*set_power_limit_uw) (struct powercap_zone *, int, u64);
190 int (*get_power_limit_uw) (struct powercap_zone *, int, u64 *);
191 int (*set_time_window_us) (struct powercap_zone *, int, u64);
192 int (*get_time_window_us) (struct powercap_zone *, int, u64 *);
193 int (*get_max_power_uw) (struct powercap_zone *, int, u64 *);
194 int (*get_min_power_uw) (struct powercap_zone *, int, u64 *);
195 int (*get_max_time_window_us) (struct powercap_zone *, int, u64 *);
196 int (*get_min_time_window_us) (struct powercap_zone *, int, u64 *);
197 const char *(*get_name) (struct powercap_zone *, int);
198};
199
200/**
201 * struct powercap_zone_constraint- Defines instance of a constraint
202 * @id: Instance Id of this constraint.
203 * @power_zone: Pointer to the power zone for this constraint.
204 * @ops: Pointer to the constraint callbacks.
205 *
206 * This defines a constraint instance.
207 */
208struct powercap_zone_constraint {
209 int id;
210 struct powercap_zone *power_zone;
211 struct powercap_zone_constraint_ops *ops;
212};
213
214
215/* For clients to get their device pointer, may be used for dev_dbgs */
216#define POWERCAP_GET_DEV(power_zone) (&power_zone->dev)
217
218/**
219* powercap_set_zone_data() - Set private data for a zone
220* @power_zone: A pointer to the valid zone instance.
221* @pdata: A pointer to the user private data.
222*
223* Allows client drivers to associate some private data to zone instance.
224*/
225static inline void powercap_set_zone_data(struct powercap_zone *power_zone,
226 void *pdata)
227{
228 if (power_zone)
229 power_zone->private_data = pdata;
230}
231
232/**
233* powercap_get_zone_data() - Get private data for a zone
234* @power_zone: A pointer to the valid zone instance.
235*
236* Allows client drivers to get private data associate with a zone,
237* using call to powercap_set_zone_data.
238*/
239static inline void *powercap_get_zone_data(struct powercap_zone *power_zone)
240{
241 if (power_zone)
242 return power_zone->private_data;
243 return NULL;
244}
245
246/**
247* powercap_register_control_type() - Register a control_type with framework
248* @control_type: Pointer to client allocated memory for the control type
249* structure storage. If this is NULL, powercap framework
250* will allocate memory and own it.
251* Advantage of this parameter is that client can embed
252* this data in its data structures and allocate in a
253* single call, preventing multiple allocations.
254* @control_type_name: The Name of this control_type, which will be shown
255* in the sysfs Interface.
256* @ops: Callbacks for control type. This parameter is optional.
257*
258* Used to create a control_type with the power capping class. Here control_type
259* can represent a type of technology, which can control a range of power zones.
260* For example a control_type can be RAPL (Running Average Power Limit)
261* IntelĀ® 64 and IA-32 Processor Architectures. The name can be any string
262* which must be unique, otherwise this function returns NULL.
263* A pointer to the control_type instance is returned on success.
264*/
265struct powercap_control_type *powercap_register_control_type(
266 struct powercap_control_type *control_type,
267 const char *name,
268 const struct powercap_control_type_ops *ops);
269
270/**
271* powercap_unregister_control_type() - Unregister a control_type from framework
272* @instance: A pointer to the valid control_type instance.
273*
274* Used to unregister a control_type with the power capping class.
275* All power zones registered under this control type have to be unregistered
276* before calling this function, or it will fail with an error code.
277*/
278int powercap_unregister_control_type(struct powercap_control_type *instance);
279
280/* Zone register/unregister API */
281
282/**
283* powercap_register_zone() - Register a power zone
284* @power_zone: Pointer to client allocated memory for the power zone structure
285* storage. If this is NULL, powercap framework will allocate
286* memory and own it. Advantage of this parameter is that client
287* can embed this data in its data structures and allocate in a
288* single call, preventing multiple allocations.
289* @control_type: A control_type instance under which this zone operates.
290* @name: A name for this zone.
291* @parent: A pointer to the parent power zone instance if any or NULL
292* @ops: Pointer to zone operation callback structure.
293* @no_constraints: Number of constraints for this zone
294* @const_ops: Pointer to constraint callback structure
295*
296* Register a power zone under a given control type. A power zone must register
297* a pointer to a structure representing zone callbacks.
298* A power zone can be located under a parent power zone, in which case @parent
299* should point to it. Otherwise, if @parent is NULL, the new power zone will
300* be located directly under the given control type
301* For each power zone there may be a number of constraints that appear in the
302* sysfs under that zone as attributes with unique numeric IDs.
303* Returns pointer to the power_zone on success.
304*/
305struct powercap_zone *powercap_register_zone(
306 struct powercap_zone *power_zone,
307 struct powercap_control_type *control_type,
308 const char *name,
309 struct powercap_zone *parent,
310 const struct powercap_zone_ops *ops,
311 int nr_constraints,
312 struct powercap_zone_constraint_ops *const_ops);
313
314/**
315* powercap_unregister_zone() - Unregister a zone device
316* @control_type: A pointer to the valid instance of a control_type.
317* @power_zone: A pointer to the valid zone instance for a control_type
318*
319* Used to unregister a zone device for a control_type. Caller should
320* make sure that children for this zone are unregistered first.
321*/
322int powercap_unregister_zone(struct powercap_control_type *control_type,
323 struct powercap_zone *power_zone);
324
325#endif