alistair23-linux

redonkable

Author	SHA1	Message	Date
Aaron Plattner	fefa8ff810	cpufreq: unlock when failing cpufreq_update_policy() Commit `bd0fa9bb45` introduced a failure path to cpufreq_update_policy() if cpufreq_driver->get(cpu) returns NULL. However, it jumps to the 'no_policy' label, which exits without unlocking any of the locks the function acquired earlier. This causes later calls into cpufreq to hang. Fix this by creating a new 'unlock' label and jumping to that instead. Fixes: `bd0fa9bb45` ("cpufreq: Return error if ->get() failed in cpufreq_update_policy()") Link: https://devtalk.nvidia.com/default/topic/751903/kernel-3-15-and-nv-drivers-337-340-failed-to-initialize-the-nvidia-kernel-module-gtx-550-ti-/ Signed-off-by: Aaron Plattner <aplattner@nvidia.com> Cc: 3.15+ <stable@vger.kernel.org> # 3.15+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-06-18 21:52:20 +02:00
Viresh Kumar	1c03a2d04d	cpufreq: add support for intermediate (stable) frequencies Douglas Anderson, recently pointed out an interesting problem due to which udelay() was expiring earlier than it should. While transitioning between frequencies few platforms may temporarily switch to a stable frequency, waiting for the main PLL to stabilize. For example: When we transition between very low frequencies on exynos, like between 200MHz and 300MHz, we may temporarily switch to a PLL running at 800MHz. No CPUFREQ notification is sent for that. That means there's a period of time when we're running at 800MHz but loops_per_jiffy is calibrated at between 200MHz and 300MHz. And so udelay behaves badly. To get this fixed in a generic way, introduce another set of callbacks get_intermediate() and target_intermediate(), only for drivers with target_index() and CPUFREQ_ASYNC_NOTIFICATION unset. get_intermediate() should return a stable intermediate frequency platform wants to switch to, and target_intermediate() should set CPU to that frequency, before jumping to the frequency corresponding to 'index'. Core will take care of sending notifications and driver doesn't have to handle them in target_intermediate() or target_index(). NOTE: ->target_index() should restore to policy->restore_freq in case of failures as core would send notifications for that. Tested-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Doug Anderson <dianders@chromium.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-06-05 23:32:29 +02:00
Viresh Kumar	8d65775d17	cpufreq: handle calls to ->target_index() in separate routine Handling calls to ->target_index() has got complex over time and might become more complex. So, its better to take target_index() bits out in another routine __target_index() for better code readability. Shouldn't have any functional impact. Tested-by: Stephen Warren <swarren@nvidia.com> Reviewed-by: Doug Anderson <dianders@chromium.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-05-29 01:27:38 +02:00
Stratos Karafotis	5eeaf1f189	cpufreq: Fix build error on some platforms that use cpufreq_for_each_* On platforms that use cpufreq_for_each_* macros, build fails if CONFIG_CPU_FREQ=n, e.g. ARM/shmobile/koelsch/non-multiplatform: drivers/built-in.o: In function `clk_round_parent': clkdev.c:(.text+0xcf168): undefined reference to `cpufreq_next_valid' drivers/built-in.o: In function `clk_rate_table_find': clkdev.c:(.text+0xcf820): undefined reference to `cpufreq_next_valid' make[3]: *** [vmlinux] Error 1 Fix this making cpufreq_next_valid function inline and move it to cpufreq.h. Fixes: `27e289dce2` (cpufreq: Introduce macros for cpufreq_frequency_table iteration) Reported-and-tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-05-08 13:10:56 +02:00
Srivatsa S. Bhat	ca654dc3a9	cpufreq: Catch double invocations of cpufreq_freq_transition_begin/end Some cpufreq drivers were redundantly invoking the _begin() and _end() APIs around frequency transitions, and this double invocation (one from the cpufreq core and the other from the cpufreq driver) used to result in a self-deadlock, leading to system hangs during boot. (The _begin() API makes contending callers wait until the previous invocation is complete. Hence, the cpufreq driver would end up waiting on itself!). Now all such drivers have been fixed, but debugging this issue was not very straight-forward (even lockdep didn't catch this). So let us add a debug infrastructure to the cpufreq core to catch such issues more easily in the future. We add a new field called 'transition_task' to the policy structure, to keep track of the task which is performing the frequency transition. Using this field, we make note of this task during _begin() and print a warning if we find a case where the same task is calling _begin() again, before completing the previous frequency transition using the corresponding _end(). We have left out ASYNC_NOTIFICATION drivers from this debug infrastructure for 2 reasons: 1. At the moment, we have no way to avoid a particular scenario where this debug infrastructure can emit false-positive warnings for such drivers. The scenario is depicted below: Task A Task B /* 1st freq transition / Invoke _begin() { ... ... } Change the frequency / 2nd freq transition / Invoke _begin() { ... //waiting for B to ... //finish _end() for ... //the 1st transition ... \| Got interrupt for successful ... \| change of frequency (1st one). ... \| ... \| / 1st freq transition */ ... \| Invoke _end() { ... \| ... ... V } ... ... } This scenario is actually deadlock-free because, once Task A changes the frequency, it is Task B's responsibility to invoke the corresponding _end() for the 1st frequency transition. Hence it is perfectly legal for Task A to go ahead and attempt another frequency transition in the meantime. (Of course it won't be able to proceed until Task B finishes the 1st _end(), but this doesn't cause a deadlock or a hang). The debug infrastructure cannot handle this scenario and will treat it as a deadlock and print a warning. To avoid this, we exclude such drivers from the purview of this code. 2. Luckily, we don't _need_ this infrastructure for ASYNC_NOTIFICATION drivers at all! The cpufreq core does not automatically invoke the _begin() and _end() APIs during frequency transitions in such drivers. Thus, the driver alone is responsible for invoking _begin()/_end() and hence there shouldn't be any conflicts which lead to double invocations. So, we can skip these drivers, since the probability that such drivers will hit this problem is extremely low, as outlined above. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-05-07 00:32:39 +02:00
Stratos Karafotis	27e289dce2	cpufreq: Introduce macros for cpufreq_frequency_table iteration Many cpufreq drivers need to iterate over the cpufreq_frequency_table for various tasks. This patch introduces two macros which can be used for iteration over cpufreq_frequency_table keeping a common coding style across drivers: - cpufreq_for_each_entry: iterate over each entry of the table - cpufreq_for_each_valid_entry: iterate over each entry that contains a valid frequency. It should have no functional changes. Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Acked-by: Lad, Prabhakar <prabhakar.csengg@gmail.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-04-30 00:05:31 +02:00
Viresh Kumar	236a980052	cpufreq: Make cpufreq_notify_transition & cpufreq_notify_post_transition static cpufreq_notify_transition() and cpufreq_notify_post_transition() shouldn't be called directly by cpufreq drivers anymore and so these should be marked static. Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-03-26 16:41:41 +01:00
Viresh Kumar	8fec051eea	cpufreq: Convert existing drivers to use cpufreq_freq_transition_{begin\|end} CPUFreq core has new infrastructure that would guarantee serialized calls to target() or target_index() callbacks. These are called cpufreq_freq_transition_begin() and cpufreq_freq_transition_end(). This patch converts existing drivers to use these new set of routines. Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-03-26 16:41:41 +01:00
Srivatsa S. Bhat	12478cf0c5	cpufreq: Make sure frequency transitions are serialized Whenever we change the frequency of a CPU, we call the PRECHANGE and POSTCHANGE notifiers. They must be serialized, i.e. PRECHANGE and POSTCHANGE notifiers should strictly alternate, thereby preventing two different sets of PRECHANGE or POSTCHANGE notifiers from interleaving arbitrarily. The following examples illustrate why this is important: Scenario 1: ----------- A thread reading the value of cpuinfo_cur_freq, will call __cpufreq_cpu_get()->cpufreq_out_of_sync()->cpufreq_notify_transition() The ondemand governor can decide to change the frequency of the CPU at the same time and hence it can end up sending the notifications via ->target(). If the notifiers are not serialized, the following sequence can occur: - PRECHANGE Notification for freq A (from cpuinfo_cur_freq) - PRECHANGE Notification for freq B (from target()) - Freq changed by target() to B - POSTCHANGE Notification for freq B - POSTCHANGE Notification for freq A We can see from the above that the last POSTCHANGE Notification happens for freq A but the hardware is set to run at freq B. Where would we break then?: adjust_jiffies() in cpufreq.c & cpufreq_callback() in arch/arm/kernel/smp.c (which also adjusts the jiffies). All the loops_per_jiffy calculations will get messed up. Scenario 2: ----------- The governor calls __cpufreq_driver_target() to change the frequency. At the same time, if we change scaling_{min\|max}_freq from sysfs, it will end up calling the governor's CPUFREQ_GOV_LIMITS notification, which will also call __cpufreq_driver_target(). And hence we end up issuing concurrent calls to ->target(). Typically, platforms have the following logic in their ->target() routines: (Eg: cpufreq-cpu0, omap, exynos, etc) A. If new freq is more than old: Increase voltage B. Change freq C. If new freq is less than old: decrease voltage Now, if the two concurrent calls to ->target() are X and Y, where X is trying to increase the freq and Y is trying to decrease it, we get the following race condition: X.A: voltage gets increased for larger freq Y.A: nothing happens Y.B: freq gets decreased Y.C: voltage gets decreased X.B: freq gets increased X.C: nothing happens Thus we can end up setting a freq which is not supported by the voltage we have set. That will probably make the clock to the CPU unstable and the system might not work properly anymore. This patch introduces a set of synchronization primitives to serialize frequency transitions, which are to be used as shown below: cpufreq_freq_transition_begin(); //Perform the frequency change cpufreq_freq_transition_end(); The _begin() call sends the PRECHANGE notification whereas the _end() call sends the POSTCHANGE notification. Also, all the necessary synchronization is handled within these calls. In particular, even drivers which set the ASYNC_NOTIFICATION flag can also use these APIs for performing frequency transitions (ie., you can call _begin() from one task, and call the corresponding _end() from a different task). The actual synchronization underneath is not that complicated: The key challenge is to allow drivers to begin the transition from one thread and end it in a completely different thread (this is to enable drivers that do asynchronous POSTCHANGE notification from bottom-halves, to also use the same interface). To achieve this, a 'transition_ongoing' flag, a 'transition_lock' spinlock and a wait-queue are added per-policy. The flag and the wait-queue are used in conjunction to create an "uninterrupted flow" from _begin() to _end(). The spinlock is used to ensure that only one such "flow" is in flight at any given time. Put together, this provides us all the necessary synchronization. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-03-26 16:41:40 +01:00
Viresh Kumar	0c5aa405a9	cpufreq: resume drivers before enabling governors During suspend, we first stop governors and then suspend cpufreq drivers and resume must be exactly opposite of that. i.e. resume drivers first and then start governors. But the current code in resume enables governors first and then resume drivers. Fix it be changing code sequence there. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-03-26 16:37:18 +01:00
Dirk Brandewie	367dc4aa93	cpufreq: Add stop CPU callback to cpufreq_driver interface This callback allows the driver to do clean up before the CPU is completely down and its state cannot be modified. This is used by the intel_pstate driver to reduce the requested P state prior to the core going away. This is required because the requested P state of the offline core is used to select the package P state. This effectively sets the floor package P state to the requested P state on the offline core. Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> [rjw: Minor modifications] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-03-20 03:50:12 +01:00
Stratos Karafotis	bda9f552f9	cpufreq: Remove unnecessary braces Remove unnecessary braces from a single statement. Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-03-20 03:40:48 +01:00
Stratos Karafotis	e5c87b7628	cpufreq: Fix checkpatch errors and warnings Fix 2 checkpatch errors about using assignment in if condition, 1 checkpatch error about a required space after comma and 3 warnings about line over 80 characters. Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-03-20 03:39:28 +01:00
Viresh Kumar	0b443ead71	cpufreq: remove unused notifier: CPUFREQ_{SUSPENDCHANGE\|RESUMECHANGE} Two cpufreq notifiers CPUFREQ_RESUMECHANGE and CPUFREQ_SUSPENDCHANGE have not been used for some time, so remove them to clean up code a bit. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> [rjw: Changelog] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-03-19 14:10:24 +01:00
Rafael J. Wysocki	9832235f3f	cpufreq: Do not allow ->setpolicy drivers to provide ->target cpufreq drivers that provide the ->setpolicy() callback are supposed to have integrated governors, so they don't need to set ->target() or ->target_index() and may confuse the core if any of these callbacks is present. For this reason, add a check preventing ->setpolicy cpufreq drivers from registering if they have non-NULL ->target or ->target_index. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2014-03-19 12:48:30 +01:00
Rafael J. Wysocki	15afee3aea	Merge back earlier 'pm-cpufreq' material.	2014-03-17 13:51:39 +01:00
Rafael J. Wysocki	2ed99e39cb	cpufreq: Skip current frequency initialization for ->setpolicy drivers After commit `da60ce9f2f` (cpufreq: call cpufreq_driver->get() after calling ->init()) __cpufreq_add_dev() sometimes fails for CPUs handled by intel_pstate, because that driver may return 0 from its ->get() callback if it has not run long enough to collect enough samples on the given CPU. That didn't happen before commit `da60ce9f2f` which added policy->cur initialization to __cpufreq_add_dev() to help reduce code duplication in other cpufreq drivers. However, the code added by commit `da60ce9f2f` need not be executed for cpufreq drivers having the ->setpolicy callback defined, because the subsequent invocation of cpufreq_set_policy() will use that callback to initialize the policy anyway and it doesn't need policy->cur to be initialized upfront. The analogous code in cpufreq_update_policy() is also unnecessary for cpufreq drivers having ->setpolicy set and may be skipped for them as well. Since intel_pstate provides ->setpolicy, skipping the upfront policy->cur initialization for cpufreq drivers with that callback set will cover intel_pstate and the problem it's been having after commit `da60ce9f2f` will be addressed. Fixes: `da60ce9f2f` (cpufreq: call cpufreq_driver->get() after calling ->init()) References: https://bugzilla.kernel.org/show_bug.cgi?id=71931 Reported-and-tested-by: Patrik Lundquist <patrik.lundquist@gmail.com> Acked-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Cc: 3.13+ <stable@vger.kernel.org> # 3.13+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-03-13 00:37:16 +01:00
Viresh Kumar	96bbbe4a2a	cpufreq: Remove unnecessary variable/parameter 'frozen' We have used 'frozen' variable/function parameter at many places to distinguish between CPU offline/online on suspend/resume vs sysfs removals. We now have another variable cpufreq_suspended which can be used in these cases, so we can get rid of all those variables or function parameters. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-03-12 01:06:01 +01:00
Viresh Kumar	e0b3165ba5	cpufreq: add 'freq_table' in struct cpufreq_policy freq table is not per CPU but per policy, so it makes more sense to keep it within struct cpufreq_policy instead of a per-cpu variable. This patch does it. Over that, there is no need to set policy->freq_table to NULL in ->exit(), as policy structure is going to be freed soon. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-03-12 01:06:00 +01:00
Joe Perches	e837f9b58b	cpufreq: Reformat printk() statements - Add missing newlines - Coalesce format fragments - Convert printks to pr_<level> - Align arguments Based-on-patch-by: Sören Brinkmann <soren.brinkmann@xilinx.com> Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-03-12 00:49:22 +01:00
Viresh Kumar	e28867eab7	cpufreq: Implement cpufreq_generic_suspend() Multiple platforms need to set CPUs to a particular frequency before suspending the system, so provide a common infrastructure for them. Those platforms only need to point their ->suspend callback pointers to the generic routine. Tested-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> [rjw: Changelog] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-03-06 15:04:12 +01:00
Viresh Kumar	2f0aea9363	cpufreq: suspend governors on system suspend/hibernate This patch adds cpufreq suspend/resume calls to dpm_{suspend\|resume}() for handling suspend/resume of cpufreq governors. Lan Tianyu (Intel) & Jinhyuk Choi (Broadcom) found an issue where the tunables configuration for clusters/sockets with non-boot CPUs was lost after system suspend/resume, as we were notifying governors with CPUFREQ_GOV_POLICY_EXIT on removal of the last CPU for that policy which caused the tunables memory to be freed. This is fixed by preventing any governor operations from being carried out between the device suspend and device resume stages of system suspend and resume, respectively. We could have added these callbacks at dpm_{suspend\|resume}_noirq() level, but there is an additional problem that the majority of I/O devices is already suspended at that point and if cpufreq drivers want to change the frequency before suspending, then that not be possible on some platforms (which depend on peripherals like i2c, regulators, etc). Reported-and-tested-by: Lan Tianyu <tianyu.lan@intel.com> Reported-by: Jinhyuk Choi <jinchoi@broadcom.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> [rjw: Changelog] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-03-06 15:04:12 +01:00
viresh kumar	6e2c89d16d	cpufreq: move call to __find_governor() to cpufreq_init_policy() We call __find_governor() during the addition of the first CPU of each policy from __cpufreq_add_dev() to find the last governor used for this CPU before it was hot-removed. After that we call cpufreq_parse_governor() in cpufreq_init_policy(), either with this governor, or with the default governor. Right after that policy->governor is set to NULL. While that code is not functionally problematic, the structure of it is suboptimal, because some of the code required in cpufreq_init_policy() is being executed by its caller, __cpufreq_add_dev(). So, it would make more sense to get all of it together in a single place to make code more readable. Accordingly, move the code needed for policy initialization to cpufreq_init_policy() and initialize policy->governor to NULL at the beginning. In order to clean up the code a bit more, some of the #ifdefs for CONFIG_HOTPLUG_CPU are dropped too. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> [rjw: Changelog] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-03-06 14:38:44 +01:00
Rafael J. Wysocki	3b4aff0472	Merge back earlier 'pm-cpufreq' material.	2014-03-06 13:25:59 +01:00
Viresh Kumar	4e97b631f2	cpufreq: Initialize governor for a new policy under policy->rwsem policy->rwsem is used to lock access to all parts of code modifying struct cpufreq_policy, but it's not used on a new policy created by __cpufreq_add_dev(). Because of that, if cpufreq_update_policy() is called in a tight loop on one CPU in parallel with offline/online of another CPU, then the following crash can be triggered: Unable to handle kernel NULL pointer dereference at virtual address 00000020 pgd = c0003000 [00000020] pgd=80000000004003, pmd=00000000 Internal error: Oops: 206 [#1] PREEMPT SMP ARM PC is at __cpufreq_governor+0x10/0x1ac LR is at cpufreq_update_policy+0x114/0x150 ---[ end trace f23a8defea6cd706 ]--- Kernel panic - not syncing: Fatal exception CPU0: stopping CPU: 0 PID: 7136 Comm: mpdecision Tainted: G D W 3.10.0-gd727407-00074-g979ede8 #396 [<c0afe180>] (notifier_call_chain+0x40/0x68) from [<c02a23ac>] (__blocking_notifier_call_chain+0x40/0x58) [<c02a23ac>] (__blocking_notifier_call_chain+0x40/0x58) from [<c02a23d8>] (blocking_notifier_call_chain+0x14/0x1c) [<c02a23d8>] (blocking_notifier_call_chain+0x14/0x1c) from [<c0803c68>] (cpufreq_set_policy+0xd4/0x2b8) [<c0803c68>] (cpufreq_set_policy+0xd4/0x2b8) from [<c0803e7c>] (cpufreq_init_policy+0x30/0x98) [<c0803e7c>] (cpufreq_init_policy+0x30/0x98) from [<c0805a18>] (__cpufreq_add_dev.isra.17+0x4dc/0x7a4) [<c0805a18>] (__cpufreq_add_dev.isra.17+0x4dc/0x7a4) from [<c0805d38>] (cpufreq_cpu_callback+0x58/0x84) [<c0805d38>] (cpufreq_cpu_callback+0x58/0x84) from [<c0afe180>] (notifier_call_chain+0x40/0x68) [<c0afe180>] (notifier_call_chain+0x40/0x68) from [<c02812dc>] (__cpu_notify+0x28/0x44) [<c02812dc>] (__cpu_notify+0x28/0x44) from [<c0aeed90>] (_cpu_up+0xf4/0x1dc) [<c0aeed90>] (_cpu_up+0xf4/0x1dc) from [<c0aeeed4>] (cpu_up+0x5c/0x78) [<c0aeeed4>] (cpu_up+0x5c/0x78) from [<c0aec808>] (store_online+0x44/0x74) [<c0aec808>] (store_online+0x44/0x74) from [<c03a40f4>] (sysfs_write_file+0x108/0x14c) [<c03a40f4>] (sysfs_write_file+0x108/0x14c) from [<c03517d4>] (vfs_write+0xd0/0x180) [<c03517d4>] (vfs_write+0xd0/0x180) from [<c0351ca8>] (SyS_write+0x38/0x68) [<c0351ca8>] (SyS_write+0x38/0x68) from [<c0205de0>] (ret_fast_syscall+0x0/0x30) Fix that by taking locks at appropriate places in __cpufreq_add_dev() as well. Reported-by: Saravana Kannan <skannan@codeaurora.org> Suggested-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> [rjw: Changelog] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-03-06 13:25:30 +01:00
Viresh Kumar	5a7e56a5d2	cpufreq: Initialize policy before making it available for others to use Policy must be fully initialized before it is being made available for use by others. Otherwise cpufreq_cpu_get() would be able to grab a half initialized policy structure that might not have affected_cpus (for example) populated. Then, anybody accessing those fields will get a wrong value and that will lead to unpredictable results. In order to fix this, do all the necessary initialization before we make the policy structure available via cpufreq_cpu_get(). That will guarantee that any code accessing fields of the policy will get correct data from them. Reported-by: Saravana Kannan <skannan@codeaurora.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> [rjw: Changelog] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-03-06 13:25:29 +01:00
Aaron Plattner	999976e0f6	cpufreq: use cpufreq_cpu_get() to avoid cpufreq_get() race conditions If a module calls cpufreq_get while cpufreq is initializing, it's possible for it to be called after cpufreq_driver is set but before cpufreq_cpu_data is written during subsys_interface_register. This happens because cpufreq_get doesn't take the cpufreq_driver_lock around its use of cpufreq_cpu_data. Fix this by using cpufreq_cpu_get(cpu) to look up the policy rather than reading it out of cpufreq_cpu_data directly. cpufreq_cpu_get() takes the appropriate locks to prevent this race from happening. Since it's possible for policy to be NULL if the caller passes in an invalid CPU number or calls the function before cpufreq is initialized, delete the BUG_ON(!policy) and simply return 0. Don't try to return -ENOENT because that's negative and the function returns an unsigned integer. References: https://bbs.archlinux.org/viewtopic.php?id=177934 Signed-off-by: Aaron Plattner <aplattner@nvidia.com> Cc: 3.13+ <stable@vger.kernel.org> # 3.13+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-03-06 13:25:16 +01:00
Viresh Kumar	bd0fa9bb45	cpufreq: Return error if ->get() failed in cpufreq_update_policy() cpufreq_update_policy() calls cpufreq_driver->get() to get current frequency of a CPU and it is not supposed to fail or return zero. Return error in case that happens. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-03-01 01:39:30 +01:00
Rashika Kheria	8a5c74a175	cpufreq: Mark function as static in cpufreq.c Mark function as static in cpufreq.c because it is not used outside this file. This eliminates the following warning in cpufreq.c: drivers/cpufreq/cpufreq.c:355:9: warning: no previous prototype for ‘show_boost’ [-Wmissing-prototypes] Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-02-27 00:49:36 +01:00
Viresh Kumar	1c0ca90207	cpufreq: don't call cpufreq_update_policy() on CPU addition cpufreq_update_policy() is called from two places currently. From a workqueue handled queued from cpufreq_bp_resume() for boot CPU and from cpufreq_cpu_callback() whenever a CPU is added. The first one makes sure that boot CPU is running on the frequency present in policy->cpu. But we don't really need a call from cpufreq_cpu_callback(), because we always call cpufreq_driver->init() (which will set policy->cur correctly) whenever first CPU of any policy is added back. And so every policy structure is guaranteed to have the right frequency in policy->cur. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-02-24 13:37:43 +01:00
Rafael J. Wysocki	d9a789c7a0	cpufreq: Refactor cpufreq_set_policy() Reduce the rampant usage of goto and the indentation level in cpufreq_set_policy() to improve the readability of that code. No functional changes should result from that. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>	2014-02-24 13:37:43 +01:00
viresh kumar	6964d91db2	cpufreq: remove sysfs link when a cpu != policy->cpu, is removed Commit `42f921a` (cpufreq: remove sysfs files for CPUs which failed to come back after resume) tried to do this but missed this piece of code to fix. Currently we are getting this on suspend/resume: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 877 at fs/sysfs/dir.c:52 sysfs_warn_dup+0x68/0x84() sysfs: cannot create duplicate filename '/devices/system/cpu/cpu1/cpufreq' Modules linked in: brcmfmac brcmutil CPU: 0 PID: 877 Comm: test-rtc-resume Not tainted 3.14.0-rc2-00259-g9398a10cd964 #12 [<c0015bac>] (unwind_backtrace) from [<c0011850>] (show_stack+0x10/0x14) [<c0011850>] (show_stack) from [<c056e018>] (dump_stack+0x80/0xcc) [<c056e018>] (dump_stack) from [<c0025e44>] (warn_slowpath_common+0x64/0x88) [<c0025e44>] (warn_slowpath_common) from [<c0025efc>] (warn_slowpath_fmt+0x30/0x40) [<c0025efc>] (warn_slowpath_fmt) from [<c012776c>] (sysfs_warn_dup+0x68/0x84) [<c012776c>] (sysfs_warn_dup) from [<c0127a54>] (sysfs_do_create_link_sd+0xb0/0xb8) [<c0127a54>] (sysfs_do_create_link_sd) from [<c038ef64>] (__cpufreq_add_dev.isra.27+0x2a8/0x814) [<c038ef64>] (__cpufreq_add_dev.isra.27) from [<c038f548>] (cpufreq_cpu_callback+0x70/0x8c) [<c038f548>] (cpufreq_cpu_callback) from [<c0043864>] (notifier_call_chain+0x44/0x84) [<c0043864>] (notifier_call_chain) from [<c0025f60>] (__cpu_notify+0x28/0x44) [<c0025f60>] (__cpu_notify) from [<c00261e8>] (_cpu_up+0xf0/0x140) [<c00261e8>] (_cpu_up) from [<c0569eb8>] (enable_nonboot_cpus+0x68/0xb0) [<c0569eb8>] (enable_nonboot_cpus) from [<c006339c>] (suspend_devices_and_enter+0x198/0x2dc) [<c006339c>] (suspend_devices_and_enter) from [<c0063654>] (pm_suspend+0x174/0x1e8) [<c0063654>] (pm_suspend) from [<c00624e0>] (state_store+0x6c/0xbc) [<c00624e0>] (state_store) from [<c01fc200>] (kobj_attr_store+0x14/0x20) [<c01fc200>] (kobj_attr_store) from [<c0126e50>] (sysfs_kf_write+0x44/0x48) [<c0126e50>] (sysfs_kf_write) from [<c012a274>] (kernfs_fop_write+0xb4/0x14c) [<c012a274>] (kernfs_fop_write) from [<c00d4818>] (vfs_write+0xa8/0x180) [<c00d4818>] (vfs_write) from [<c00d4bb8>] (SyS_write+0x3c/0x70) [<c00d4bb8>] (SyS_write) from [<c000e620>] (ret_fast_syscall+0x0/0x30) ---[ end trace 76969904b614c18f ]--- Fix this by removing sysfs link for cpufreq directory when cpu removed isn't policy->cpu. Revamps: `42f921a` (cpufreq: remove sysfs files for CPUs which failed to come back after resume) Reported-and-tested-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-02-19 01:04:40 +01:00
Lukasz Majewski	6f19efc0a1	cpufreq: Add boost frequency support in core This commit adds boost frequency support in cpufreq core (Hardware & Software). Some SoCs (like Exynos4 - e.g. 4x12) allow setting frequency above its normal operation limits. Such mode shall be only used for a short time. Overclocking (boost) support is essentially provided by platform dependent cpufreq driver. This commit unifies support for SW and HW (Intel) overclocking solutions in the core cpufreq driver. Previously the "boost" sysfs attribute was defined in the ACPI processor driver code. By default boost is disabled. One global attribute is available at: /sys/devices/system/cpu/cpufreq/boost. It only shows up when cpufreq driver supports overclocking. Under the hood frequencies dedicated for boosting are marked with a special flag (CPUFREQ_BOOST_FREQ) at driver's frequency table. It is the user's concern to enable/disable overclocking with a proper call to sysfs. The cpufreq_boost_trigger_state() function is defined non static on purpose. It is used later with thermal subsystem to provide automatic enable/disable of the BOOST feature. Signed-off-by: Lukasz Majewski <l.majewski@samsung.com> Signed-off-by: Myungjoo Ham <myungjoo.ham@samsung.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-01-17 02:00:44 +01:00
Viresh Kumar	652ed95d5f	cpufreq: introduce cpufreq_generic_get() routine CPUFreq drivers that use clock frameworks interface,i.e. clk_get_rate(), to get CPUs clk rate, have similar sort of code used in most of them. This patch adds a generic ->get() which will do the same thing for them. All those drivers are required to now is to set .get to cpufreq_generic_get() and set their clk pointer in policy->clk during ->init(). Acked-by: Hans-Christian Egtvedt <egtvedt@samfundet.no> Acked-by: Shawn Guo <shawn.guo@linaro.org> Acked-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Shawn Guo <shawn.guo@linaro.org> Acked-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-01-17 02:00:44 +01:00
Viresh Kumar	fcd7af917a	cpufreq: stats: handle cpufreq_unregister_driver() and suspend/resume properly There are several problems with cpufreq stats in the way it handles cpufreq_unregister_driver() and suspend/resume.. - We must not lose data collected so far when suspend/resume happens and so stats directories must not be removed/allocated during these operations, which is done currently. - cpufreq_stat has registered notifiers with both cpufreq and hotplug. It adds sysfs stats directory with a cpufreq notifier: CPUFREQ_NOTIFY and removes this directory with a notifier from hotplug core. In case cpufreq_unregister_driver() is called (on rmmod cpufreq driver), stats directories per cpu aren't removed as CPUs are still online. The only call cpufreq_stats gets is cpufreq_stats_update_policy_cpu() for all CPUs except the last of each policy. And pointer to stat information is stored in the entry for last CPU in the per-cpu cpufreq_stats_table. But policy structure would be freed inside cpufreq core and so that will result in memory leak inside cpufreq stats (as we are never freeing memory for stats). Now if we again insert the module cpufreq_register_driver() will be called and we will again allocate stats data and put it on for first CPU of every policy. In case we only have a single CPU per policy, we will return with a error from cpufreq_stats_create_table() due to this code: if (per_cpu(cpufreq_stats_table, cpu)) return -EBUSY; And so probably cpufreq stats directory would not show up anymore (as it was added inside last policies->kobj which doesn't exist anymore). I haven't tested it, though. Also the values in stats files wouldn't be refreshed as we are using the earlier stats structure. - CPUFREQ_NOTIFY is called from cpufreq_set_policy() which is called for scenarios where we don't really want cpufreq_stat_notifier_policy() to get called. For example whenever we are changing anything related to a policy: min/max/current freq, etc. cpufreq_set_policy() is called and so cpufreq stats is notified. Where we don't do any useful stuff other than simply returning with -EBUSY from cpufreq_stats_create_table(). And so this isn't the right notifier that cpufreq stats.. Due to all above reasons this patch does following changes: - Add new notifiers CPUFREQ_CREATE_POLICY and CPUFREQ_REMOVE_POLICY, which are only called when policy is created/destroyed. They aren't called for suspend/resume paths.. - Use these notifiers in cpufreq_stat_notifier_policy() to create/destory stats sysfs entries. And so cpufreq_unregister_driver() or suspend/resume shouldn't be a problem for cpufreq_stats. - Return early from cpufreq_stat_cpu_callback() for suspend/resume sequence, so that we don't free stats structure. Acked-by: Nicolas Pitre <nico@linaro.org> Tested-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-01-17 02:00:43 +01:00
Viresh Kumar	d3916691c9	cpufreq: Make sure CPU is running on a freq from freq-table Sometimes boot loaders set CPU frequency to a value outside of frequency table present with cpufreq core. In such cases CPU might be unstable if it has to run on that frequency for long duration of time and so its better to set it to a frequency which is specified in freq-table. This also makes cpufreq stats inconsistent as cpufreq-stats would fail to register because current frequency of CPU isn't found in freq-table. Because we don't want this change to affect boot process badly, we go for the next freq which is >= policy->cur ('cur' must be set by now, otherwise we will end up setting freq to lowest of the table as 'cur' is initialized to zero). In case current frequency doesn't match any frequency from freq-table, we throw warnings to user, so that user can get this fixed in their bootloaders or freq-tables. Reported-by: Carlos Hernandez <ceh@ti.com> Reported-and-tested-by: Nishanth Menon <nm@ti.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-01-06 14:17:25 +01:00
Viresh Kumar	ab1b1c4e82	cpufreq: send new set of notification for transition failures In the current code, if we fail during a frequency transition, we simply send the POSTCHANGE notification with the old frequency. This isn't enough. One of the core users of these notifications is the code responsible for keeping loops_per_jiffy aligned with frequency changes. And mostly it is written as: if ((val == CPUFREQ_PRECHANGE && freq->old < freq->new) \|\| (val == CPUFREQ_POSTCHANGE && freq->old > freq->new)) { update-loops-per-jiffy... } So, suppose we are changing to a higher frequency and failed during transition, then following will happen: - CPUFREQ_PRECHANGE notification with freq-new > freq-old - CPUFREQ_POSTCHANGE notification with freq-new == freq-old The first one will update loops_per_jiffy and second one will do nothing. Even if we send the 2nd notification by exchanging values of freq-new and old, some users of these notifications might get unstable. This can be fixed by simply calling cpufreq_notify_post_transition() with error code and this routine will take care of sending notifications in the correct order. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> [rjw: Folded 3 patches into one, rebased unicore2 changes] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-01-06 01:43:44 +01:00
Viresh Kumar	f7ba3b41e2	cpufreq: Introduce cpufreq_notify_post_transition() This introduces a new routine cpufreq_notify_post_transition() which can be used to send POSTCHANGE notification for new freq with or without both {PRE\|POST}CHANGE notifications for last freq. This is useful at multiple places, especially for sending transition failure notifications. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-01-06 01:43:44 +01:00
Jane Li	6f1e4efd88	cpufreq: Fix timer/workqueue corruption by protecting reading governor_enabled When a CPU is hot removed we'll cancel all the delayed work items via gov_cancel_work(). Sometimes the delayed work function determines that it should adjust the delay for all other CPUs that the policy is managing. If this scenario occurs, the canceling CPU will cancel its own work but queue up the other CPUs works to run. Commit 3617f2 (cpufreq: Fix timer/workqueue corruption due to double queueing) has tried to fix this, but reading governor_enabled is not protected by cpufreq_governor_lock. Even though od_dbs_timer() checks governor_enabled before gov_queue_work(), this scenario may occur. For example: CPU0 CPU1 ---- ---- cpu_down() ... <work runs> __cpufreq_remove_dev() od_dbs_timer() __cpufreq_governor() policy->governor_enabled policy->governor_enabled = false; cpufreq_governor_dbs() case CPUFREQ_GOV_STOP: gov_cancel_work(dbs_data, policy); cpu0 work is canceled timer is canceled cpu1 work is canceled <waits for cpu1> gov_queue_work(, , true); cpu0 work queued cpu1 work queued cpu2 work queued ... cpu1 work is canceled cpu2 work is canceled ... At the end of the GOV_STOP case cpu0 still has a work queued to run although the code is expecting all of the works to be canceled. __cpufreq_remove_dev() will then proceed to re-initialize all the other CPUs works except for the CPU that is going down. The CPUFREQ_GOV_START case in cpufreq_governor_dbs() will trample over the queued work and debugobjects will spit out a warning: WARNING: at lib/debugobjects.c:260 debug_print_object+0x94/0xbc() ODEBUG: init active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x14 Modules linked in: CPU: 1 PID: 1205 Comm: sh Tainted: G W 3.10.0 #200 [<c01144f0>] (unwind_backtrace+0x0/0xf8) from [<c0111d98>] (show_stack+0x10/0x14) [<c0111d98>] (show_stack+0x10/0x14) from [<c01272cc>] (warn_slowpath_common+0x4c/0x68) [<c01272cc>] (warn_slowpath_common+0x4c/0x68) from [<c012737c>] (warn_slowpath_fmt+0x30/0x40) [<c012737c>] (warn_slowpath_fmt+0x30/0x40) from [<c034c640>] (debug_print_object+0x94/0xbc) [<c034c640>] (debug_print_object+0x94/0xbc) from [<c034c7f8>] (__debug_object_init+0xc8/0x3c0) [<c034c7f8>] (__debug_object_init+0xc8/0x3c0) from [<c01360e0>] (init_timer_key+0x20/0x104) [<c01360e0>] (init_timer_key+0x20/0x104) from [<c04872ac>] (cpufreq_governor_dbs+0x1dc/0x68c) [<c04872ac>] (cpufreq_governor_dbs+0x1dc/0x68c) from [<c04833a8>] (__cpufreq_governor+0x80/0x1b0) [<c04833a8>] (__cpufreq_governor+0x80/0x1b0) from [<c0483704>] (__cpufreq_remove_dev.isra.12+0x22c/0x380) [<c0483704>] (__cpufreq_remove_dev.isra.12+0x22c/0x380) from [<c0692f38>] (cpufreq_cpu_callback+0x48/0x5c) [<c0692f38>] (cpufreq_cpu_callback+0x48/0x5c) from [<c014fb40>] (notifier_call_chain+0x44/0x84) [<c014fb40>] (notifier_call_chain+0x44/0x84) from [<c012ae44>] (__cpu_notify+0x2c/0x48) [<c012ae44>] (__cpu_notify+0x2c/0x48) from [<c068dd40>] (_cpu_down+0x80/0x258) [<c068dd40>] (_cpu_down+0x80/0x258) from [<c068df40>] (cpu_down+0x28/0x3c) [<c068df40>] (cpu_down+0x28/0x3c) from [<c068e4c0>] (store_online+0x30/0x74) [<c068e4c0>] (store_online+0x30/0x74) from [<c03a7308>] (dev_attr_store+0x18/0x24) [<c03a7308>] (dev_attr_store+0x18/0x24) from [<c0256fe0>] (sysfs_write_file+0x100/0x180) [<c0256fe0>] (sysfs_write_file+0x100/0x180) from [<c01fec9c>] (vfs_write+0xbc/0x184) [<c01fec9c>] (vfs_write+0xbc/0x184) from [<c01ff034>] (SyS_write+0x40/0x68) [<c01ff034>] (SyS_write+0x40/0x68) from [<c010e200>] (ret_fast_syscall+0x0/0x48) In gov_queue_work(), lock cpufreq_governor_lock before gov_queue_work, and unlock it after __gov_queue_work(). In this way, governor_enabled is guaranteed not changed in gov_queue_work(). Signed-off-by: Jane Li <jiel@marvell.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-01-06 01:22:02 +01:00
Viresh Kumar	08fd8c1cf0	cpufreq: preserve user_policy across suspend/resume Prevent __cpufreq_add_dev() from overwriting the existing values of user_policy.{min\|max\|policy\|governor} with defaults during resume from system suspend. Fixes: `5302c3fb2e` ("cpufreq: Perform light-weight init/teardown during suspend/resume") Reported-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: 3.12+ <stable@vger.kernel.org> # 3.12+ [rjw: Changelog] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-12-29 15:31:21 +01:00
Rafael J. Wysocki	72368d122c	cpufreq: Clean up after a failing light-weight initialization If cpufreq_policy_restore() returns NULL during system resume, __cpufreq_add_dev() should just fall back to the full initialization instead of returning an error, because that may actually make things work. Moreover, it should not leave stale fallback data behind after it has failed to restore a previously existing policy. This change is based on Viresh Kumar's work. Fixes: `5302c3fb2e` ("cpufreq: Perform light-weight init/teardown during suspend/resume") Reported-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: 3.12+ <stable@vger.kernel.org> # 3.12+	2013-12-29 15:30:36 +01:00
Jason Baron	a27a9ab706	cpufreq: Use CONFIG_CPU_FREQ_DEFAULT_* to set initial policy for setpolicy drivers When configuring a default governor (via CONFIG_CPU_FREQ_DEFAULT_*) with the intel_pstate driver, the desired default policy is not properly set. For example, setting 'CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE' ends up with the 'powersave' policy being set. Fix by configuring the correct default policy, if either 'powersave' or 'performance' are requested. Otherwise, fallback to what the driver originally set via its 'init' routine. Signed-off-by: Jason Baron <jbaron@akamai.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-12-22 00:51:52 +01:00
Viresh Kumar	42f921a6f1	cpufreq: remove sysfs files for CPUs which failed to come back after resume There are cases where cpufreq_add_dev() may fail for some CPUs during system resume. With the current code we will still have sysfs cpufreq files for those CPUs and struct cpufreq_policy would be already freed for them. Hence any operation on those sysfs files would result in kernel warnings. Example of problems resulting from resume errors (from Bjørn Mork): WARNING: CPU: 0 PID: 6055 at fs/sysfs/file.c:343 sysfs_open_file+0x77/0x212() missing sysfs attribute operations for kobject: (null) Modules linked in: [stripped as irrelevant] CPU: 0 PID: 6055 Comm: grep Tainted: G D 3.13.0-rc2 #153 Hardware name: LENOVO 2776LEG/2776LEG, BIOS 6EET55WW (3.15 ) 12/19/2011 0000000000000009 ffff8802327ebb78 ffffffff81380b0e 0000000000000006 ffff8802327ebbc8 ffff8802327ebbb8 ffffffff81038635 0000000000000000 ffffffff811823c7 ffff88021a19e688 ffff88021a19e688 ffff8802302f9310 Call Trace: [<ffffffff81380b0e>] dump_stack+0x55/0x76 [<ffffffff81038635>] warn_slowpath_common+0x7c/0x96 [<ffffffff811823c7>] ? sysfs_open_file+0x77/0x212 [<ffffffff810386e3>] warn_slowpath_fmt+0x41/0x43 [<ffffffff81182dec>] ? sysfs_get_active+0x6b/0x82 [<ffffffff81182382>] ? sysfs_open_file+0x32/0x212 [<ffffffff811823c7>] sysfs_open_file+0x77/0x212 [<ffffffff81182350>] ? sysfs_schedule_callback+0x1ac/0x1ac [<ffffffff81122562>] do_dentry_open+0x17c/0x257 [<ffffffff8112267e>] finish_open+0x41/0x4f [<ffffffff81130225>] do_last+0x80c/0x9ba [<ffffffff8112dbbd>] ? inode_permission+0x40/0x42 [<ffffffff81130606>] path_openat+0x233/0x4a1 [<ffffffff81130b7e>] do_filp_open+0x35/0x85 [<ffffffff8113b787>] ? __alloc_fd+0x172/0x184 [<ffffffff811232ea>] do_sys_open+0x6b/0xfa [<ffffffff811233a7>] SyS_openat+0xf/0x11 [<ffffffff8138c812>] system_call_fastpath+0x16/0x1b To fix this, remove those sysfs files or put the associated kobject in case of such errors. Also, to make it simple, remove the cpufreq sysfs links from all the CPUs (except for the policy->cpu) during suspend, as that operation won't result in a loss of sysfs file permissions and we can create those links during resume just fine. Fixes: `5302c3fb2e` ("cpufreq: Perform light-weight init/teardown during suspend/resume") Reported-and-tested-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: 3.12+ <stable@vger.kernel.org> # 3.12+ [rjw: Changelog] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-12-22 00:47:46 +01:00
Rafael J. Wysocki	d4faadd5d5	Revert "cpufreq: fix garbage kobjects on errors during suspend/resume" Commit `2167e2399d` (cpufreq: fix garbage kobjects on errors during suspend/resume) breaks suspend/resume on Martin Ziegler's system (hard lockup during resume), so revert it. Fixes: `2167e2399d` (cpufreq: fix garbage kobjects on errors during suspend/resume) References: https://bugzilla.kernel.org/show_bug.cgi?id=66751 Reported-by: Martin Ziegler <ziegler@uni-freiburg.de> Cc: 3.12+ <stable@vger.kernel.org> # 3.12+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-12-08 01:32:41 +01:00
Rafael J. Wysocki	12205a4b79	Revert "cpufreq: suspend governors on system suspend/hibernate" Commit `5a87182aa2` (cpufreq: suspend governors on system suspend/hibernate) causes hibernation problems to happen on Bjørn Mork's and Paul Bolle's systems, so revert it. Fixes: `5a87182aa2` (cpufreq: suspend governors on system suspend/hibernate) Reported-by: Bjørn Mork <bjorn@mork.no> Reported-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-12-08 01:04:17 +01:00
Bjørn Mork	2167e2399d	cpufreq: fix garbage kobjects on errors during suspend/resume This is effectively a revert of commit `5302c3fb2e` ("cpufreq: Perform light-weight init/teardown during suspend/resume"), which enabled suspend/resume optimizations leaving the sysfs files in place. Errors during suspend/resume are not handled properly, leaving dead sysfs attributes in case of failures. There are are number of functions with special code for the "frozen" case, and all these need to also have special error handling. The problem is easy to demonstrate by making cpufreq_driver->init() or cpufreq_driver->get() fail during resume. The code is too complex for a simple fix, with split code paths in multiple blocks within a number of functions. It is therefore best to revert the patch enabling this code until the error handling is in place. Examples of problems resulting from resume errors: WARNING: CPU: 0 PID: 6055 at fs/sysfs/file.c:343 sysfs_open_file+0x77/0x212() missing sysfs attribute operations for kobject: (null) Modules linked in: [stripped as irrelevant] CPU: 0 PID: 6055 Comm: grep Tainted: G D 3.13.0-rc2 #153 Hardware name: LENOVO 2776LEG/2776LEG, BIOS 6EET55WW (3.15 ) 12/19/2011 0000000000000009 ffff8802327ebb78 ffffffff81380b0e 0000000000000006 ffff8802327ebbc8 ffff8802327ebbb8 ffffffff81038635 0000000000000000 ffffffff811823c7 ffff88021a19e688 ffff88021a19e688 ffff8802302f9310 Call Trace: [<ffffffff81380b0e>] dump_stack+0x55/0x76 [<ffffffff81038635>] warn_slowpath_common+0x7c/0x96 [<ffffffff811823c7>] ? sysfs_open_file+0x77/0x212 [<ffffffff810386e3>] warn_slowpath_fmt+0x41/0x43 [<ffffffff81182dec>] ? sysfs_get_active+0x6b/0x82 [<ffffffff81182382>] ? sysfs_open_file+0x32/0x212 [<ffffffff811823c7>] sysfs_open_file+0x77/0x212 [<ffffffff81182350>] ? sysfs_schedule_callback+0x1ac/0x1ac [<ffffffff81122562>] do_dentry_open+0x17c/0x257 [<ffffffff8112267e>] finish_open+0x41/0x4f [<ffffffff81130225>] do_last+0x80c/0x9ba [<ffffffff8112dbbd>] ? inode_permission+0x40/0x42 [<ffffffff81130606>] path_openat+0x233/0x4a1 [<ffffffff81130b7e>] do_filp_open+0x35/0x85 [<ffffffff8113b787>] ? __alloc_fd+0x172/0x184 [<ffffffff811232ea>] do_sys_open+0x6b/0xfa [<ffffffff811233a7>] SyS_openat+0xf/0x11 [<ffffffff8138c812>] system_call_fastpath+0x16/0x1b The failure to restore cpufreq devices on cancelled hibernation is not a new bug. It is caused by the ACPI _PPC call failing unless the hibernate is completed. This makes the acpi_cpufreq driver fail its init. Previously, the cpufreq device could be restored by offlining the cpu temporarily. And as a complete hibernation cycle would do this, it would be automatically restored most of the time. But after commit `5302c3fb2e` the leftover sysfs attributes will block any device add action. Therefore offlining and onlining CPU 1 will no longer restore the cpufreq object, and a complete suspend/resume cycle will replace it with garbage. Fixes: `5302c3fb2e` ("cpufreq: Perform light-weight init/teardown during suspend/resume") Cc: 3.12+ <stable@vger.kernel.org> # 3.12+ Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-12-03 15:25:52 +01:00
Viresh Kumar	5a87182aa2	cpufreq: suspend governors on system suspend/hibernate This patch adds cpufreq suspend/resume calls to dpm_{suspend\|resume}_noirq() for handling suspend/resume of cpufreq governors. Lan Tianyu (Intel) & Jinhyuk Choi (Broadcom) found anr issue where tunables configuration for clusters/sockets with non-boot CPUs was getting lost after suspend/resume, as we were notifying governors with CPUFREQ_GOV_POLICY_EXIT on removal of the last cpu for that policy and so deallocating memory for tunables. This is fixed by this patch as we don't allow any operation on governors after device suspend and before device resume now. Reported-and-tested-by: Lan Tianyu <tianyu.lan@intel.com> Reported-by: Jinhyuk Choi <jinchoi@broadcom.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> [rjw: Changelog, minor cleanups] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-11-28 14:47:31 +01:00
Viresh Kumar	d4019f0a92	cpufreq: move freq change notifications to cpufreq core Most of the drivers do following in their ->target_index() routines: struct cpufreq_freqs freqs; freqs.old = old freq... freqs.new = new freq... cpufreq_notify_transition(policy, &freqs, CPUFREQ_PRECHANGE); /* Change rate here */ cpufreq_notify_transition(policy, &freqs, CPUFREQ_POSTCHANGE); This is replicated over all cpufreq drivers today and there doesn't exists a good enough reason why this shouldn't be moved to cpufreq core instead. There are few special cases though, like exynos5440, which doesn't do everything on the call to ->target_index() routine and call some kind of bottom halves for doing this work, work/tasklet/etc.. They may continue doing notification from their own code as flag: CPUFREQ_ASYNC_NOTIFICATION is already set for them. All drivers are also modified in this patch to avoid breaking 'git bisect', as double notification would happen otherwise. Acked-by: Hans-Christian Egtvedt <egtvedt@samfundet.no> Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> Acked-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Russell King <linux@arm.linux.org.uk> Acked-by: Stephen Warren <swarren@nvidia.com> Tested-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Nicolas Pitre <nicolas.pitre@linaro.org> Reviewed-by: Lan Tianyu <tianyu.lan@intel.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-10-31 00:11:08 +01:00
viresh kumar	ad7722dab7	cpufreq: create per policy rwsem instead of per CPU cpu_policy_rwsem We have per-CPU cpu_policy_rwsem for cpufreq core, but we never use all of them. We always use rwsem of policy->cpu and so we can actually make this rwsem per policy instead. This patch does this change. With this change other tricky situations are also avoided now, like which lock to take while we are changing policy->cpu, etc. Suggested-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Tested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-10-25 23:54:12 +02:00
Viresh Kumar	9c0ebcf78f	cpufreq: Implement light weight ->target_index() routine Currently, the prototype of cpufreq_drivers target routines is: int target(struct cpufreq_policy policy, unsigned int target_freq, unsigned int relation); And most of the drivers call cpufreq_frequency_table_target() to get a valid index of their frequency table which is closest to the target_freq. And they don't use target_freq and relation after that. So, it makes sense to just do this work in cpufreq core before calling cpufreq_frequency_table_target() and simply pass index instead. But this can be done only with drivers which expose their frequency table with cpufreq core. For others we need to stick with the old prototype of target() until those drivers are converted to expose frequency tables. This patch implements the new light weight prototype for target_index() routine. It looks like this: int target_index(struct cpufreq_policy policy, unsigned int index); CPUFreq core will call cpufreq_frequency_table_target() before calling this routine and pass index to it. Because CPUFreq core now requires to call routines present in freq_table.c CONFIG_CPU_FREQ_TABLE must be enabled all the time. This also marks target() interface as deprecated. So, that new drivers avoid using it. And Documentation is updated accordingly. It also converts existing .target() to newly defined light weight .target_index() routine for many driver. Acked-by: Hans-Christian Egtvedt <egtvedt@samfundet.no> Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> Acked-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Russell King <linux@arm.linux.org.uk> Acked-by: David S. Miller <davem@davemloft.net> Tested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rjw@rjwysocki.net>	2013-10-25 22:42:24 +02:00
Srivatsa S. Bhat	99ec899eaf	cpufreq: Detect spurious invocations of update_policy_cpu() The function update_policy_cpu() is expected to be called when the policy->cpu of a cpufreq policy is to be changed: ie., the new CPU nominated to become the policy->cpu is different from the old one. Print a warning if it is invoked with new_cpu == old_cpu, since such an invocation might hint at a faulty logic in the caller. Suggested-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-10-17 01:09:16 +02:00
Viresh Kumar	3bc28ab6da	cpufreq: remove CONFIG_CPU_FREQ_TABLE CONFIG_CPU_FREQ_TABLE will be always enabled when cpufreq framework is used, as cpufreq core depends on it. So, we don't need this CONFIG option anymore as it is not configurable. Remove CONFIG_CPU_FREQ_TABLE and update its users. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-10-16 00:50:33 +02:00
Viresh Kumar	70e9e77833	cpufreq: create cpufreq_generic_init() routine Many CPUFreq drivers for SMP system (where all cores share same clock lines), do similar stuff in their ->init() part. This patch creates a generic routine in cpufreq core which can be used by these so that we can remove some redundant code. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-10-16 00:50:33 +02:00
Viresh Kumar	da60ce9f2f	cpufreq: call cpufreq_driver->get() after calling ->init() Almost all drivers set policy->cur with current CPU frequency in their ->init() part. This can be done for all of them at core level and so they wouldn't need to do it. This patch adds supporting code in cpufreq core for calling get() after we have called init() for a policy. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-10-16 00:50:28 +02:00
Viresh Kumar	0b981e7074	cpufreq: use cpufreq_driver->flags to mark CPUFREQ_HAVE_GOVERNOR_PER_POLICY Use cpufreq_driver->flags to mark CPUFREQ_HAVE_GOVERNOR_PER_POLICY instead of a separate field within cpufreq_driver. This will save some bytes of memory. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-10-16 00:50:23 +02:00
Viresh Kumar	037ce8397d	cpufreq: rename __cpufreq_set_policy() as cpufreq_set_policy() Earlier there used to be two functions named __cpufreq_set_policy() and cpufreq_set_policy(), but now we only have a single routine lets name it cpufreq_set_policy() instead of __cpufreq_set_policy(). This also removes some invalid comments or fixes some incorrect comments. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-10-16 00:50:23 +02:00
Viresh Kumar	27a862e983	cpufreq: remove __cpufreq_remove_dev() Nobody except cpufreq_remove_dev() calls __cpufreq_remove_dev() and so we don't need two separate routines here. Merge code from __cpufreq_remove_dev() into cpufreq_remove_dev() and get rid of __cpufreq_remove_dev(). Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-10-16 00:50:22 +02:00
Viresh Kumar	75949c9a1f	cpufreq: don't break string in print statements As a rule its better not to break string (quoted inside "") in a print statement even if it crosses 80 column boundary as that may introduce bugs and so this patch rewrites one of the print statements.. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-10-16 00:50:22 +02:00
Viresh Kumar	bbdd04ab1f	cpufreq: Remove extra blank line We don't need a blank line just at start of a block, lets remove it. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-10-16 00:50:22 +02:00
Viresh Kumar	67a29e558b	cpufreq: remove invalid comment from __cpufreq_remove_dev() Some section of kerneldoc comment for __cpufreq_remove_dev() is invalid now. Remove it. Suggested-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-10-16 00:50:21 +02:00
Viresh Kumar	1b750e3bda	cpufreq: make return type of lock_policy_rwsem_{read\|write}() as void lock_policy_rwsem_{read\|write}() currently has return type of int, but it always returns zero and hence its return type should be void instead. This patch makes that change and modifies all of the users accordingly. Reported-by: Jon Medhurst<tixy@linaro.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-10-10 02:51:14 +02:00
Viresh Kumar	26ca869434	cpufreq: check cpufreq driver is valid and cpufreq isn't disabled in cpufreq_get() cpufreq_get() can be called from external drivers which might not be aware if cpufreq driver is registered or not. And so we should actually check if cpufreq driver is registered or not and also if cpufreq is active or disabled, at the beginning of cpufreq_get(). Otherwise call to lock_policy_rwsem_read() might hit BUG_ON(!policy). Reported-and-tested-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-09-25 03:24:02 +02:00
Yinghai Lu	4dea5806d3	cpufreq: return EEXIST instead of EBUSY for second registering On systems that support intel_pstate, acpi_cpufreq fails to load, and udev keeps trying until trace gets filled up and kernel crashes. The root cause is driver return ret from cpufreq_register_driver(), because when some other driver takes over before, it will return EBUSY and then udev will keep trying ... cpufreq_register_driver() should return EEXIST instead so that the system can boot without appending intel_pstate=disable and still use intel_pstate. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-09-20 00:37:10 +02:00
Viresh Kumar	8efd57657d	cpufreq: unlock correct rwsem while updating policy->cpu Current code looks like this: WARN_ON(lock_policy_rwsem_write(cpu)); update_policy_cpu(policy, new_cpu); unlock_policy_rwsem_write(cpu); {lock\|unlock}_policy_rwsem_write(cpu) takes/releases policy->cpu's rwsem. Because cpu is changing with the call to update_policy_cpu(), the unlock_policy_rwsem_write() will release the incorrect lock. The right solution would be to release the same lock as was taken earlier. Also update_policy_cpu() was also called from cpufreq_add_dev() without any locks and so its better if we move this locking to inside update_policy_cpu(). This patch fixes a regression introduced in 3.12 by commit `f9ba680d23` (cpufreq: Extract the handover of policy cpu to a helper function). Reported-and-tested-by: Jon Medhurst<tixy@linaro.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-09-18 00:01:52 +02:00
Viresh Kumar	9c8f1ee40b	cpufreq: Clear policy->cpus bits in __cpufreq_remove_dev_finish() This broke after a recent change "cedb70a cpufreq: Split __cpufreq_remove_dev() into two parts" from Srivatsa. Consider a scenario where we have two CPUs in a policy (0 & 1) and we are removing CPU 1. On the call to __cpufreq_remove_dev_prepare() we have cleared 1 from policy->cpus and now on a call to __cpufreq_remove_dev_finish() we read cpumask_weight of policy->cpus, which will come as 1 and this code will behave as if we are removing the last CPU from policy :) Fix it by clearing the CPU mask in __cpufreq_remove_dev_finish() instead of __cpufreq_remove_dev_prepare(). Tested-by: Stephen Warren <swarren@wwwdotorg.org> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-09-18 00:01:27 +02:00
Lan Tianyu	44871c9c7f	cpufreq: Acquire the lock in cpufreq_policy_restore() for reading In cpufreq_policy_restore() before system suspend policy is read from percpu's cpufreq_cpu_data_fallback. It's a read operation rather than a write one, so take the lock for reading in there. Signed-off-by: Lan Tianyu <tianyu.lan@intel.com> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-09-11 23:30:03 +02:00
Srivatsa S. Bhat	cb38ed5cf1	cpufreq: Prevent problems in update_policy_cpu() if last_cpu == new_cpu If update_policy_cpu() is invoked with the existing policy->cpu itself as the new-cpu parameter, then a lot of things can go terribly wrong. In its present form, update_policy_cpu() always assumes that the new-cpu is different from policy->cpu and invokes other functions to perform their respective updates. And those functions implement the actual update like this: per_cpu(..., new_cpu) = per_cpu(..., last_cpu); per_cpu(..., last_cpu) = NULL; Thus, when new_cpu == last_cpu, the final NULL assignment makes the per-cpu references vanish into thin air! (memory leak). From there, it leads to more problems: cpufreq_stats_create_table() now doesn't find the per-cpu reference and hence tries to create a new sysfs-group; but sysfs already had created the group earlier, so it complains that it cannot create a duplicate filename. In short, the repercussions of a rather innocuous invocation of update_policy_cpu() can turn out to be pretty nasty. Ideally update_policy_cpu() should handle this situation (new == last) gracefully, and not lead to such severe problems. So fix it by adding an appropriate check. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Tested-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-09-11 23:29:57 +02:00
Srivatsa S. Bhat	61173f256a	cpufreq: Restructure if/else block to avoid unintended behavior In __cpufreq_remove_dev_prepare(), the code which decides whether to remove the sysfs link or nominate a new policy cpu, is governed by an if/else block with a rather complex set of conditionals. Worse, they harbor a subtlety which leads to certain unintended behavior. The code looks like this: if (cpu != policy->cpu && !frozen) { sysfs_remove_link(&dev->kobj, "cpufreq"); } else if (cpus > 1) { new_cpu = cpufreq_nominate_new_policy_cpu(...); ... update_policy_cpu(..., new_cpu); } The original intention was: If the CPU going offline is not policy->cpu, just remove the link. On the other hand, if the CPU going offline is the policy->cpu itself, handover the policy->cpu job to some other surviving CPU in that policy. But because the 'if' condition also includes the 'frozen' check, now there are two possibilities by which we can enter the 'else' block: 1. cpu == policy->cpu (intended) 2. cpu != policy->cpu && frozen (unintended) Due to the second (unintended) scenario, we end up spuriously nominating a CPU as the policy->cpu, even when the existing policy->cpu is alive and well. This can cause problems further down the line, especially when we end up nominating the same policy->cpu as the new one (ie., old == new), because it totally confuses update_policy_cpu(). To avoid this mess, restructure the if/else block to only do what was originally intended, and thus prevent any unwelcome surprises. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Tested-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-09-11 23:29:57 +02:00
Srivatsa S. Bhat	0d66b91ebf	cpufreq: Fix crash in cpufreq-stats during suspend/resume Stephen Warren reported that the cpufreq-stats code hits a NULL pointer dereference during the second attempt to suspend a system. He also pin-pointed the problem to commit `5302c3f` "cpufreq: Perform light-weight init/teardown during suspend/resume". That commit actually ensured that the cpufreq-stats table and the cpufreq-stats sysfs entries are not torn down (ie., not freed) during suspend/resume, which makes it all the more surprising. However, it turns out that the root-cause is not that we access an already freed memory, but that the reference to the allocated memory gets moved around and we lose track of that during resume, leading to the reported crash in a subsequent suspend attempt. In the suspend path, during CPU offline, the value of policy->cpu is updated by choosing one of the surviving CPUs in that policy, as long as there is atleast one CPU in that policy. And cpufreq_stats_update_policy_cpu() is invoked to update the reference to the stats structure by assigning it to the new CPU. However, in the resume path, during CPU online, we end up assigning a fresh CPU as the policy->cpu, without letting cpufreq-stats know about this. Thus the reference to the stats structure remains (incorrectly) associated with the old CPU. So, in a subsequent suspend attempt, during CPU offline, we end up accessing an incorrect location to get the stats structure, which eventually leads to the NULL pointer dereference. Fix this by letting cpufreq-stats know about the update of the policy->cpu during CPU online in the resume path. (Also, move the update_policy_cpu() function higher up in the file, so that __cpufreq_add_dev() can invoke it). Reported-and-tested-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-09-11 23:29:57 +02:00
Rafael J. Wysocki	798282a871	Revert "cpufreq: make sure frequency transitions are serialized" Commit `7c30ed5` (cpufreq: make sure frequency transitions are serialized) attempted to serialize frequency transitions by adding checks to the CPUFREQ_PRECHANGE and CPUFREQ_POSTCHANGE notifications. However, it assumed that the notifications will always originate from the driver's .target() callback, but they also can be triggered by cpufreq_out_of_sync() and that leads to warnings like this on some systems: WARNING: CPU: 0 PID: 14543 at drivers/cpufreq/cpufreq.c:317 __cpufreq_notify_transition+0x238/0x260() In middle of another frequency transition accompanied by a call trace similar to this one: [<ffffffff81720daa>] dump_stack+0x46/0x58 [<ffffffff8106534c>] warn_slowpath_common+0x8c/0xc0 [<ffffffff815b8560>] ? acpi_cpufreq_target+0x320/0x320 [<ffffffff81065436>] warn_slowpath_fmt+0x46/0x50 [<ffffffff815b1ec8>] __cpufreq_notify_transition+0x238/0x260 [<ffffffff815b33be>] cpufreq_notify_transition+0x3e/0x70 [<ffffffff815b345d>] cpufreq_out_of_sync+0x6d/0xb0 [<ffffffff815b370c>] cpufreq_update_policy+0x10c/0x160 [<ffffffff815b3760>] ? cpufreq_update_policy+0x160/0x160 [<ffffffff81413813>] cpufreq_set_cur_state+0x8c/0xb5 [<ffffffff814138df>] processor_set_cur_state+0xa3/0xcf [<ffffffff8158e13c>] thermal_cdev_update+0x9c/0xb0 [<ffffffff8159046a>] step_wise_throttle+0x5a/0x90 [<ffffffff8158e21f>] handle_thermal_trip+0x4f/0x140 [<ffffffff8158e377>] thermal_zone_device_update+0x57/0xa0 [<ffffffff81415b36>] acpi_thermal_check+0x2e/0x30 [<ffffffff81415ca0>] acpi_thermal_notify+0x40/0xdc [<ffffffff813e7dbd>] acpi_device_notify+0x19/0x1b [<ffffffff813f8241>] acpi_ev_notify_dispatch+0x41/0x5c [<ffffffff813e3fbe>] acpi_os_execute_deferred+0x25/0x32 [<ffffffff81081060>] process_one_work+0x170/0x4a0 [<ffffffff81082121>] worker_thread+0x121/0x390 [<ffffffff81082000>] ? manage_workers.isra.20+0x170/0x170 [<ffffffff81088fe0>] kthread+0xc0/0xd0 [<ffffffff81088f20>] ? flush_kthread_worker+0xb0/0xb0 [<ffffffff8173582c>] ret_from_fork+0x7c/0xb0 [<ffffffff81088f20>] ? flush_kthread_worker+0xb0/0xb0 For this reason, revert commit `7c30ed5` along with the fix `266c13d` (cpufreq: Fix serialization of frequency transitions) on top of it and we will revisit the serialization problem later. Reported-by: Alessandro Bono <alessandro.bono@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-09-10 02:54:50 +02:00
Srivatsa S. Bhat	5136fa5658	cpufreq: Use signed type for 'ret' variable, to store negative error values There are places where the variable 'ret' is declared as unsigned int and then used to store negative return values such as -EINVAL. Fix them by declaring the variable as a signed quantity. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-09-10 02:49:48 +02:00
Srivatsa S. Bhat	56d07db274	cpufreq: Remove temporary fix for race between CPU hotplug and sysfs-writes Commit "cpufreq: serialize calls to __cpufreq_governor()" had been a temporary and partial solution to the race condition between writing to a cpufreq sysfs file and taking a CPU offline. Now that we have a proper and complete solution to that problem, remove the temporary fix. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-09-10 02:49:47 +02:00
Srivatsa S. Bhat	4f750c9308	cpufreq: Synchronize the cpufreq store_() routines with CPU hotplug The functions that are used to write to cpufreq sysfs files (such as store_scaling_max_freq()) are not hotplug safe. They can race with CPU hotplug tasks and lead to problems such as trying to acquire an already destroyed timer-mutex etc. Eg: __cpufreq_remove_dev() __cpufreq_governor(policy, CPUFREQ_GOV_STOP); policy->governor->governor(policy, CPUFREQ_GOV_STOP); cpufreq_governor_dbs() case CPUFREQ_GOV_STOP: mutex_destroy(&cpu_cdbs->timer_mutex) cpu_cdbs->cur_policy = NULL; <PREEMPT> store() __cpufreq_set_policy() __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS); policy->governor->governor(policy, CPUFREQ_GOV_LIMITS); case CPUFREQ_GOV_LIMITS: mutex_lock(&cpu_cdbs->timer_mutex); <-- Warning (destroyed mutex) if (policy->max < cpu_cdbs->cur_policy->cur) <- cur_policy == NULL So use get_online_cpus()/put_online_cpus() in the store_() functions, to synchronize with CPU hotplug. However, there is an additional point to note here: some parts of the CPU teardown in the cpufreq subsystem are done in the CPU_POST_DEAD stage, with cpu_hotplug.lock released. So, using the get/put_online_cpus() functions alone is insufficient; we should also ensure that we don't race with those latter steps in the hotplug sequence. We can easily achieve this by checking if the CPU is online before proceeding with the store, since the CPU would have been marked offline by the time the CPU_POST_DEAD notifiers are executed. Reported-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-09-10 02:49:47 +02:00
Srivatsa S. Bhat	1aee40ac9c	cpufreq: Invoke __cpufreq_remove_dev_finish() after releasing cpu_hotplug.lock __cpufreq_remove_dev_finish() handles the kobject cleanup for a CPU going offline. But because we destroy the kobject towards the end of the CPU offline phase, there are certain race windows where a task can try to write to a cpufreq sysfs file (eg: using store_scaling_max_freq()) while we are taking that CPU offline, and this can bump up the kobject refcount, which in turn might hinder the CPU offline task from running to completion. (It can also cause other more serious problems such as trying to acquire a destroyed timer-mutex etc., depending on the exact stage of the cleanup at which the task managed to take a new refcount). To fix the race window, we will need to synchronize those store_() call-sites with CPU hotplug, using get_online_cpus()/put_online_cpus(). However, that in turn can cause a total deadlock because it can end up waiting for the CPU offline task to complete, with incremented refcount! Write to sysfs CPU offline task -------------- ---------------- kobj_refcnt++ Acquire cpu_hotplug.lock get_online_cpus(); Wait for kobj_refcnt to drop to zero DEADLOCK* A simple way to avoid this problem is to perform the kobject cleanup in the CPU offline path, with the cpu_hotplug.lock released. That is, we can perform the wait-for-kobj-refcnt-to-drop as well as the subsequent cleanup in the CPU_POST_DEAD stage of CPU offline, which is run with cpu_hotplug.lock released. Doing this helps us avoid deadlocks due to holding kobject refcounts and waiting on each other on the cpu_hotplug.lock. (Note: We can't move all of the cpufreq CPU offline steps to the CPU_POST_DEAD stage, because certain things such as stopping the governors have to be done before the outgoing CPU is marked offline. So retain those parts in the CPU_DOWN_PREPARE stage itself). Reported-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-09-10 02:49:47 +02:00
Srivatsa S. Bhat	cedb70afd0	cpufreq: Split __cpufreq_remove_dev() into two parts During CPU offline, the cpufreq core invokes __cpufreq_remove_dev() to perform work such as stopping the cpufreq governor, clearing the CPU from the policy structure etc, and finally cleaning up the kobject. There are certain subtle issues related to the kobject cleanup, and it would be much easier to deal with them if we separate that part from the rest of the cleanup-work in the CPU offline phase. So split the __cpufreq_remove_dev() function into 2 parts: one that handles the kobject cleanup, and the other that handles the rest of the work. Reported-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-09-10 02:49:46 +02:00
Viresh Kumar	19c763031a	cpufreq: serialize calls to __cpufreq_governor() We can't take a big lock around __cpufreq_governor() as this causes recursive locking for some cases. But calls to this routine must be serialized for every policy. Otherwise we can see some unpredictable events. For example, consider following scenario: __cpufreq_remove_dev() __cpufreq_governor(policy, CPUFREQ_GOV_STOP); policy->governor->governor(policy, CPUFREQ_GOV_STOP); cpufreq_governor_dbs() case CPUFREQ_GOV_STOP: mutex_destroy(&cpu_cdbs->timer_mutex) cpu_cdbs->cur_policy = NULL; <PREEMPT> store() __cpufreq_set_policy() __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS); policy->governor->governor(policy, CPUFREQ_GOV_LIMITS); case CPUFREQ_GOV_LIMITS: mutex_lock(&cpu_cdbs->timer_mutex); <-- Warning (destroyed mutex) if (policy->max < cpu_cdbs->cur_policy->cur) <- cur_policy == NULL And so store() will eventually result in a crash if cur_policy is NULL at this point. Introduce an additional variable which would guarantee serialization here. Reported-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-09-10 02:49:46 +02:00
Viresh Kumar	f73d393384	cpufreq: don't allow governor limits to be changed when it is disabled __cpufreq_governor() returns with -EBUSY when governor is already stopped and we try to stop it again, but when it is stopped we must not allow calls to CPUFREQ_GOV_LIMITS event as well. This patch adds this check in __cpufreq_governor(). Reported-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-09-10 02:49:45 +02:00
Li Zhong	5025d628c8	cpufreq: fix bad unlock balance on !CONFIG_SMP This patch tries to fix lockdep complaint attached below. It seems that we should always read acquire the cpufreq_rwsem, whether CONFIG_SMP is enabled or not. And CONFIG_HOTPLUG_CPU depends on CONFIG_SMP, so it seems we don't need CONFIG_SMP for the code enabled by CONFIG_HOTPLUG_CPU. [ 0.504191] ===================================== [ 0.504627] [ BUG: bad unlock balance detected! ] [ 0.504627] 3.11.0-rc6-next-20130819 #1 Not tainted [ 0.504627] ------------------------------------- [ 0.504627] swapper/1 is trying to release lock (cpufreq_rwsem) at: [ 0.504627] [<ffffffff813d927a>] cpufreq_add_dev+0x13a/0x3e0 [ 0.504627] but there are no more locks to release! [ 0.504627] [ 0.504627] other info that might help us debug this: [ 0.504627] 1 lock held by swapper/1: [ 0.504627] #0: (subsys mutex#4){+.+.+.}, at: [<ffffffff8134a7bf>] subsys_interface_register+0x4f/0xe0 [ 0.504627] [ 0.504627] stack backtrace: [ 0.504627] CPU: 0 PID: 1 Comm: swapper Not tainted 3.11.0-rc6-next-20130819 #1 [ 0.504627] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 [ 0.504627] ffffffff813d927a ffff88007f847c98 ffffffff814c062b ffff88007f847cc8 [ 0.504627] ffffffff81098bce ffff88007f847cf8 ffffffff81aadc30 ffffffff813d927a [ 0.504627] 00000000ffffffff ffff88007f847d68 ffffffff8109d0be 0000000000000006 [ 0.504627] Call Trace: [ 0.504627] [<ffffffff813d927a>] ? cpufreq_add_dev+0x13a/0x3e0 [ 0.504627] [<ffffffff814c062b>] dump_stack+0x19/0x1b [ 0.504627] [<ffffffff81098bce>] print_unlock_imbalance_bug+0xfe/0x110 [ 0.504627] [<ffffffff813d927a>] ? cpufreq_add_dev+0x13a/0x3e0 [ 0.504627] [<ffffffff8109d0be>] lock_release_non_nested+0x1ee/0x310 [ 0.504627] [<ffffffff81099d0e>] ? mark_held_locks+0xae/0x120 [ 0.504627] [<ffffffff811510cb>] ? kfree+0xcb/0x1d0 [ 0.504627] [<ffffffff813d77ea>] ? cpufreq_policy_free+0x4a/0x60 [ 0.504627] [<ffffffff813d927a>] ? cpufreq_add_dev+0x13a/0x3e0 [ 0.504627] [<ffffffff8109d2a4>] lock_release+0xc4/0x250 [ 0.504627] [<ffffffff8106c9f3>] up_read+0x23/0x40 [ 0.504627] [<ffffffff813d927a>] cpufreq_add_dev+0x13a/0x3e0 [ 0.504627] [<ffffffff8134a809>] subsys_interface_register+0x99/0xe0 [ 0.504627] [<ffffffff81b19f3b>] ? cpufreq_gov_dbs_init+0x12/0x12 [ 0.504627] [<ffffffff813d7f0d>] cpufreq_register_driver+0x9d/0x1d0 [ 0.504627] [<ffffffff81b19f3b>] ? cpufreq_gov_dbs_init+0x12/0x12 [ 0.504627] [<ffffffff81b1a039>] acpi_cpufreq_init+0xfe/0x1f8 [ 0.504627] [<ffffffff810002ba>] do_one_initcall+0xda/0x180 [ 0.504627] [<ffffffff81ae301e>] kernel_init_freeable+0x12c/0x1bb [ 0.504627] [<ffffffff81ae2841>] ? do_early_param+0x8c/0x8c [ 0.504627] [<ffffffff814b4dd0>] ? rest_init+0x140/0x140 [ 0.504627] [<ffffffff814b4dde>] kernel_init+0xe/0xf0 [ 0.504627] [<ffffffff814d029a>] ret_from_fork+0x7a/0xb0 [ 0.504627] [<ffffffff814b4dd0>] ? rest_init+0x140/0x140 Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Acked-and-tested-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-08-21 02:04:31 +02:00
Viresh Kumar	1b27429446	cpufreq: Use cpufreq_policy_list for iterating over policies To iterate over all policies we currently iterate over all online CPUs and then get the policy for each of them which is suboptimal. Use the newly created cpufreq_policy_list for this purpose instead. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-08-20 15:43:50 +02:00
Viresh Kumar	474deff744	cpufreq: remove cpufreq_policy_cpu per-cpu variable cpufreq_policy_cpu per-cpu variables are used for storing the ID of the CPU that manages the given CPU's policy. However, we also store a policy pointer for each cpu in cpufreq_cpu_data, so the cpufreq_policy_cpu information is simply redundant. It is better to use cpufreq_cpu_data to retrieve a policy and get policy->cpu from there, so make that happen everywhere and drop the cpufreq_policy_cpu per-cpu variables which aren't necessary any more. [rjw: Changelog] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-08-20 15:43:50 +02:00
Viresh Kumar	9e9fd80167	cpufreq: remove unnecessary check in __cpufreq_governor() We don't need to check if event is CPUFREQ_GOV_POLICY_INIT and put governor module as we are sure event can only be START/STOP here. Remove the useless check. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-08-20 15:43:50 +02:00
Viresh Kumar	9515f4d69b	cpufreq: remove policy from cpufreq_policy_list during suspend cpufreq_policy_list is a list of active policies. We do remove policies from this list when all CPUs belonging to that policy are removed. But during system suspend we don't really free a policy struct as it will be used again during resume, so we didn't remove it from cpufreq_policy_list as well.. However, this is incorrect. We are saying this policy isn't valid anymore and must not be referenced (though we haven't freed it), but it can still be used by code that iterates over cpufreq_policy_list. Remove policy from this list during system suspend as well. Of course, we must add it back whenever the first CPU belonging to that policy shows up. [rjw: Changelog] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-08-20 15:43:50 +02:00
Viresh Kumar	edab2fbc21	cpufreq: Fix white space in __cpufreq_remove_dev() Align closing brace '}' of an if block. [rjw: Subject and changelog] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-08-20 15:43:50 +02:00
Rafael J. Wysocki	878f6e074e	Revert "cpufreq: Use cpufreq_policy_list for iterating over policies" Revert commit `eb60852` (cpufreq: Use cpufreq_policy_list for iterating over policies), because it breaks system suspend/resume on multiple machines. It either causes resume to block indefinitely or causes the BUG_ON() in lock_policy_rwsem_##mode() to trigger on sysfs accesses to cpufreq attributes. Conflicts: drivers/cpufreq/cpufreq.c	2013-08-18 15:35:59 +02:00
Viresh Kumar	3de9bdeb28	cpufreq: improve error checking on return values of __cpufreq_governor() The __cpufreq_governor() function can fail in rare cases especially if there are bugs in cpufreq drivers. Thus we must stop processing as soon as this routine fails, otherwise it may result in undefined behavior. This patch adds error checking code whenever this routine is called from any place. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-08-10 03:24:48 +02:00
Viresh Kumar	6eed9404ab	cpufreq: Use rwsem for protecting critical sections Critical sections of the cpufreq core are protected with the help of the driver module owner's refcount, which isn't the correct approach, because it causes rmmod to return an error when some routine has updated that refcount. Let's use rwsem for this purpose instead. Only cpufreq_unregister_driver() will use write sem and everybody else will use read sem. [rjw: Subject & changelog] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-08-10 03:24:47 +02:00
Viresh Kumar	fe492f3f03	cpufreq: Fix broken usage of governor->owner's refcount The cpufreq governor owner refcount usage is broken. We should only increment that refcount when a CPUFREQ_GOV_POLICY_INIT event has come and it should only be decremented if CPUFREQ_GOV_POLICY_EXIT has come. Currently, there can be situations where the governor is in use, but we have allowed it to be unloaded which may result in undefined behavior. Let's fix it. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-08-10 03:24:47 +02:00
Viresh Kumar	eb608521f1	cpufreq: Use cpufreq_policy_list for iterating over policies To iterate over all policies we currently iterate over all CPUs and then get the policy for each of them. Let's use the newly created cpufreq_policy_list for this purpose. [rjw: Changelog] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-08-10 03:24:46 +02:00
Lukasz Majewski	c88a1f8b96	cpufreq: Store cpufreq policies in a list Policies available in the cpufreq framework are now linked together. They are accessible via cpufreq_policy_list defined in the cpufreq core. [rjw: Fix from Yinghai Lu folded in] Signed-off-by: Lukasz Majewski <l.majewski@samsung.com> Signed-off-by: Myungjoo Ham <myungjoo.ham@samsung.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-08-10 03:24:06 +02:00
Viresh Kumar	d5b73cd870	cpufreq: Use sizeof(ptr) convetion for computing sizes Chapter 14 of Documentation/CodingStyle says: The preferred form for passing a size of a struct is the following: p = kmalloc(sizeof(p), ...); The alternative form where struct name is spelled out hurts readability and introduces an opportunity for a bug when the pointer variable type is changed but the corresponding sizeof that is passed to a memory allocator is not. This wasn't followed consistently in drivers/cpufreq, let's make it more consistent by always following this rule. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-08-07 23:34:10 +02:00
Viresh Kumar	3a3e9e06d0	cpufreq: Give consistent names to cpufreq_policy objects They are called policy, cur_policy, new_policy, data, etc. Just call them policy wherever possible. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-08-07 23:34:10 +02:00
Viresh Kumar	5ff0a26803	cpufreq: Clean up header files included in the core This patch addresses the following issues in the header files in the cpufreq core: - Include headers in ascending order, so that we don't add same many times by mistake. - <asm/> must be included after <linux/>, so that they override whatever they need to. - Remove unnecessary includes. - Don't include files already included by cpufreq.h or cpufreq_governor.h. [rjw: Changelog] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-08-07 23:34:09 +02:00
Rafael J. Wysocki	1133bfa6dc	Merge branch 'pm-cpufreq-ondemand' into pm-cpufreq * pm-cpufreq: cpufreq: Remove unused function __cpufreq_driver_getavg() cpufreq: Remove unused APERF/MPERF support cpufreq: ondemand: Change the calculation of target frequency	2013-08-07 23:11:43 +02:00
Viresh Kumar	d8d3b47112	cpufreq: Pass policy to cpufreq_add_policy_cpu() The caller of cpufreq_add_policy_cpu() already has a pointer to the policy structure and there is no need to look it up again in cpufreq_add_policy_cpu(). Let's pass it directly. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-08-07 23:02:50 +02:00
Rafael J. Wysocki	10659ab7b5	cpufreq: Avoid double kobject_put() for the same kobject in error code path The only case triggering a jump to the err_out_unregister label in __cpufreq_add_dev() is when cpufreq_add_dev_interface() fails. However, if cpufreq_add_dev_interface() fails, it calls kobject_put() for the policy kobject in its error code path and since that causes the kobject's refcount to become 0, the additional kobject_put() for the same kobject under err_out_unregister and the wait_for_completion() following it are pointless, so drop them. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2013-08-07 23:02:50 +02:00
Rafael J. Wysocki	71c3461ef7	cpufreq: Do not hold driver module references for additional policy CPUs The cpufreq core is a little inconsistent in the way it uses the driver module refcount. Namely, if __cpufreq_add_dev() is called for a CPU that doesn't share the policy object with any other CPUs, the driver module refcount it grabs to start with will be dropped by it before returning and will be equal to whatever it had been before that function was invoked. However, if the given CPU does share the policy object with other CPUs, either cpufreq_add_policy_cpu() is called to link the new CPU to the existing policy, or cpufreq_add_dev_symlink() is used to link the other CPUs sharing the policy with it to the just created policy object. In that case, because both cpufreq_add_policy_cpu() and cpufreq_add_dev_symlink() call cpufreq_cpu_get() for the given policy (the latter possibly many times) without the balancing cpufreq_cpu_put() (unless there is an error), the driver module refcount will be left by __cpufreq_add_dev() with a nonzero value (different from the initial one). To remove that inconsistency make cpufreq_add_policy_cpu() execute cpufreq_cpu_put() for the given policy before returning, which decrements the driver module refcount so that it will be equal to its initial value after __cpufreq_add_dev() returns. Also remove the cpufreq_cpu_get() call from cpufreq_add_dev_symlink(), since both the policy refcount and the driver module refcount are nonzero when it is called and they don't need to be bumped up by it. Accordingly, drop the cpufreq_cpu_put() from __cpufreq_remove_dev(), since it is only necessary to balance the cpufreq_cpu_get() called by cpufreq_add_policy_cpu() or cpufreq_add_dev_symlink(). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2013-08-07 23:02:50 +02:00
Viresh Kumar	308b60e715	cpufreq: Don't pass CPU to cpufreq_add_dev_{symlink\|interface}() Pointer to struct cpufreq_policy is already passed to these routines and we don't need to send policy->cpu to them as well. So, get rid of this extra argument and use policy->cpu everywhere. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-08-07 23:02:49 +02:00
Viresh Kumar	e8fdde1011	cpufreq: Remove extra variables from cpufreq_add_dev_symlink() We call cpufreq_cpu_get() in cpufreq_add_dev_symlink() to increase usage refcount of policy, but not to get a policy for the given CPU. So, we don't really need to capture the return value of this routine. We can simply use policy passed as an argument to cpufreq_add_dev_symlink(). Moreover debug print is rewritten to make it more clear. [rjw: Changelog] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-08-07 23:02:49 +02:00
Srivatsa S. Bhat	5302c3fb2e	cpufreq: Perform light-weight init/teardown during suspend/resume Now that we have the infrastructure to perform a light-weight init/tear-down, use that in the cpufreq CPU hotplug notifier when invoked from the suspend/resume path. This also ensures that the file permissions of the cpufreq sysfs files are preserved across suspend/resume, something which commit a66b2e (cpufreq: Preserve sysfs files across suspend/resume) originally intended to do, but had to be reverted due to other problems. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-08-07 23:02:49 +02:00
Srivatsa S. Bhat	8414809c6a	cpufreq: Preserve policy structure across suspend/resume To perform light-weight cpu-init and teardown in the cpufreq subsystem during suspend/resume, we need to separate out the 2 main functionalities of the cpufreq CPU hotplug callbacks, as outlined below: 1. Init/tear-down of core cpufreq and CPU-specific components, which are critical to the correct functioning of the cpufreq subsystem. 2. Init/tear-down of cpufreq sysfs files during suspend/resume. The first part requires accurate updates to the policy structure such as its ->cpus and ->related_cpus masks, whereas the second part requires that the policy->kobj structure is not released or re-initialized during suspend/resume. To handle both these requirements, we need to allow updates to the policy structure throughout suspend/resume, but prevent the structure from getting freed up. Also, we must have a mechanism by which the cpu-up callbacks can restore the policy structure, without allocating things afresh. (That also helps avoid memory leaks). To achieve this, we use 2 schemes: a. Use a fallback per-cpu storage area for preserving the policy structures during suspend, so that they can be restored during resume appropriately. b. Use the 'frozen' flag to determine when to free or allocate the policy structure vs when to restore the policy from the saved fallback storage. Thus we can successfully preserve the structure across suspend/resume. Effectively, this helps us complete the separation of the 'light-weight' and the 'full' init/tear-down sequences in the cpufreq subsystem, so that this can be made use of in the suspend/resume scenario. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-08-07 23:02:49 +02:00
Srivatsa S. Bhat	a82fab2928	cpufreq: Introduce a flag ('frozen') to separate full vs temporary init/teardown During suspend/resume we would like to do a light-weight init/teardown of CPUs in the cpufreq subsystem and preserve certain things such as sysfs files etc across suspend/resume transitions. Add a flag called 'frozen' to help distinguish the full init/teardown sequence from the light-weight one. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-08-07 23:02:48 +02:00
Srivatsa S. Bhat	f9ba680d23	cpufreq: Extract the handover of policy cpu to a helper function During cpu offline, when the policy->cpu is going down, some other CPU present in the policy->cpus mask is nominated as the new policy->cpu. Extract this functionality from __cpufreq_remove_dev() and implement it in a helper function. This helps in upcoming code reorganization. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-08-07 23:02:48 +02:00
Srivatsa S. Bhat	e18f1682bc	cpufreq: Extract non-interface related stuff from cpufreq_add_dev_interface cpufreq_add_dev_interface() includes the work of exposing the interface to the device, as well as a lot of unrelated stuff. Move the latter to cpufreq_add_dev(), where it is more appropriate. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-08-07 23:02:48 +02:00
Srivatsa S. Bhat	e9698cc5d2	cpufreq: Add helper to perform alloc/free of policy structure Separate out the allocation of the cpufreq policy structure (along with its error handling) to a helper function. This makes the code easier to read and also helps with some upcoming code reorganization. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-08-07 23:02:48 +02:00
Srivatsa S. Bhat	23d328994b	cpufreq: Fix misplaced call to cpufreq_update_policy() The call to cpufreq_update_policy() is placed in the CPU hotplug callback of cpufreq_stats, which has a higher priority than the CPU hotplug callback of cpufreq-core. As a result, during CPU_ONLINE/CPU_ONLINE_FROZEN, we end up calling cpufreq_update_policy() before calling cpufreq_add_dev() ! And for uninitialized CPUs, it just returns silently, not doing anything. To add to that, cpufreq_stats is not even the right place to call cpufreq_update_policy() to begin with. The cpufreq core ought to handle this in its own callback, from an elegance/relevance perspective. So move the invocation of cpufreq_update_policy() to cpufreq_cpu_callback, and place it after cpufreq_add_dev(). Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-08-07 23:02:48 +02:00
Rafael J. Wysocki	2a99859932	cpufreq: Fix cpufreq driver module refcount balance after suspend/resume Since cpufreq_cpu_put() called by __cpufreq_remove_dev() drops the driver module refcount, __cpufreq_remove_dev() causes that refcount to become negative for the cpufreq driver after a suspend/resume cycle. This is not the only bad thing that happens there, however, because kobject_put() should only be called for the policy kobject at this point if the CPU is not the last one for that policy. Namely, if the given CPU is the last one for that policy, the policy kobject's refcount should be 1 at this point, as set by cpufreq_add_dev_interface(), and only needs to be dropped once for the kobject to go away. This actually happens under the cpu == 1 check, so it need not be done before by cpufreq_cpu_put(). On the other hand, if the given CPU is not the last one for that policy, this means that cpufreq_add_policy_cpu() has been called at least once for that policy and cpufreq_cpu_get() has been called for it too. To balance that cpufreq_cpu_get(), we need to call cpufreq_cpu_put() in that case. Thus, to fix the described problem and keep the reference counters balanced in both cases, move the cpufreq_cpu_get() call in __cpufreq_remove_dev() to the code path executed only for CPUs that share the policy with other CPUs. Reported-and-tested-by: Toralf Förster <toralf.foerster@gmx.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: 3.10+ <stable@vger.kernel.org>	2013-07-30 00:32:00 +02:00
Stratos Karafotis	cffe4e0e74	cpufreq: Remove unused function __cpufreq_driver_getavg() The target frequency calculation method in the ondemand governor has changed and it is now independent of the measured average frequency. Consequently, the __cpufreq_driver_getavg() function and getavg member of struct cpufreq_driver are not used any more, so drop them. [rjw: Changelog] Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-07-26 01:06:44 +02:00
Linus Torvalds	b7356abb9f	Power management and ACPI fixes for 3.11-rc2 - Two cpufreq commits from the 3.10 cycle introduced regressions. The first of them was buggy (it did way much more than it needed to do) and the second one attempted to fix an issue introduced by the first one. Fixes from Srivatsa S Bhat revert both. - If autosleep triggers during system shutdown and the shutdown callbacks of some device drivers have been called already, it may crash the system. Fix from Liu Shuo prevents that from happening by making try_to_suspend() check system_state. - The ACPI memory hotplug driver doesn't clear its driver_data on errors which may cause a NULL poiter dereference to happen later. Fix from Toshi Kani. - The ACPI namespace scanning code should not try to attach scan handlers to device objects that have them already, which may confuse things quite a bit, and it should rescan the whole namespace branch starting at the given node after receiving a bus check notify event even if the device at that particular node has been discovered already. Fixes from Rafael J Wysocki. - New ACPI video blacklist entry for a system whose initial backlight setting from the BIOS doesn't make sense. From Lan Tianyu. - Garbage string output avoindance for ACPI PNP from Liu Shuo. - Two Kconfig fixes for issues introduced recently in the s3c24xx cpufreq driver (when moving the driver to drivers/cpufreq) from Paul Bolle. - Trivial comment fix in pm_wakeup.h from Chanwoo Choi. / -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iQIcBAABAgAGBQJR6Sq+AAoJEKhOf7ml8uNsrGQP/0HRDW+QmTGM8znDTHXngbn9 X3pqlpjEOiCtcmJaSJlD7GwLHMscwWcHKEezteaZ7KUI4mcnysJX6EY5YVbNriDC xlLcDQn9c6Xx1maCSfp+WMygvqItxZwuc8veRjrT3XtOfCNWS/FlX40Voh63BCAe GbfQ/HesmUg5CKplyD8/XypLWh5OFXmHzCe8IhrKGfhsZukXdSgSBjwQZMRrEMsQ kJjDCF8zUu0JisiWqL+xE6IFSKme9i6LBlHpzU0Y1g4RqAqkIbuS0Z3vezOYzoTD oZjBNa9XAgCS3x0l5g3G0ChgDAU+Mpji/imXA7JysrwbirGFbtPHtQYh2HzpAtnw Hkah/0ocBM7/w7VTsUQiRsFPdIJTCBLlm6J38x8yh7n84h4nJgOpK69dBLrMwCUZ f3kid6KIPVLBvnC3QSULrCAKUcUcVVWYtNho+sfXBMjP+cPwTmc3DvATnpru6twa 0KjR5o585UOcciq7EWAoMrCFCfZYF5C4XGaZAxHI/SWooxeCQH84S8vfNLL2epVC ixmLYo4X2ANDsnfbUV+ewhB0/L2905Et6NhPUgPD/1rm15MEZbowbB2K0pzr0QL9 /1hTL61InXx3jLxducJJFKN+HZ0zfDQdTkyafKrR9jb+GsdmnzYJ/vnfDG8MfPjp GZ281YBqVmUeYJh5CPU+ =IUmn -----END PGP SIGNATURE----- Merge tag 'pm+acpi-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management and ACPI fixes from Rafael Wysocki: "These are fixes collected over the last week, most importnatly two cpufreq reverts fixing regressions introduced in 3.10, an autoseelp fix preventing systems using it from crashing during shutdown and two ACPI scan fixes related to hotplug. Specifics: - Two cpufreq commits from the 3.10 cycle introduced regressions. The first of them was buggy (it did way much more than it needed to do) and the second one attempted to fix an issue introduced by the first one. Fixes from Srivatsa S Bhat revert both. - If autosleep triggers during system shutdown and the shutdown callbacks of some device drivers have been called already, it may crash the system. Fix from Liu Shuo prevents that from happening by making try_to_suspend() check system_state. - The ACPI memory hotplug driver doesn't clear its driver_data on errors which may cause a NULL poiter dereference to happen later. Fix from Toshi Kani. - The ACPI namespace scanning code should not try to attach scan handlers to device objects that have them already, which may confuse things quite a bit, and it should rescan the whole namespace branch starting at the given node after receiving a bus check notify event even if the device at that particular node has been discovered already. Fixes from Rafael J Wysocki. - New ACPI video blacklist entry for a system whose initial backlight setting from the BIOS doesn't make sense. From Lan Tianyu. - Garbage string output avoindance for ACPI PNP from Liu Shuo. - Two Kconfig fixes for issues introduced recently in the s3c24xx cpufreq driver (when moving the driver to drivers/cpufreq) from Paul Bolle. - Trivial comment fix in pm_wakeup.h from Chanwoo Choi" * tag 'pm+acpi-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI / video: ignore BIOS initial backlight value for Fujitsu E753 PNP / ACPI: avoid garbage in resource name cpufreq: Revert commit `2f7021a8` to fix CPU hotplug regression cpufreq: s3c24xx: fix "depends on ARM_S3C24XX" in Kconfig cpufreq: s3c24xx: rename CONFIG_CPU_FREQ_S3C24XX_DEBUGFS PM / Sleep: Fix comment typo in pm_wakeup.h PM / Sleep: avoid 'autosleep' in shutdown progress cpufreq: Revert commit a66b2e to fix suspend/resume regression ACPI / memhotplug: Fix a stale pointer in error path ACPI / scan: Always call acpi_bus_scan() for bus check notifications ACPI / scan: Do not try to attach scan handlers to devices having them	2013-07-19 09:59:06 -07:00
Paul Gortmaker	2760984f65	cpufreq: delete __cpuinit usage from all cpufreq files The __cpuinit type of throwaway sections might have made sense some time ago when RAM was more constrained, but now the savings do not offset the cost and complications. For example, the fix in commit `5e427ec2d0` ("x86: Fix bit corruption at CPU resume time") is a good example of the nasty type of bugs that can be created with improper use of the various __init prefixes. After a discussion on LKML[1] it was decided that cpuinit should go the way of devinit and be phased out. Once all the users are gone, we can then finally remove the macros themselves from linux/init.h. This removes all the drivers/cpufreq uses of the __cpuinit macros from all C files. [1] https://lkml.org/lkml/2013/5/20/589 [v2: leave 2nd lines of args misaligned as requested by Viresh] Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: cpufreq@vger.kernel.org Cc: linux-pm@vger.kernel.org Acked-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2013-07-14 19:36:57 -04:00
Srivatsa S. Bhat	aae760ed21	cpufreq: Revert commit a66b2e to fix suspend/resume regression commit a66b2e (cpufreq: Preserve sysfs files across suspend/resume) has unfortunately caused several things in the cpufreq subsystem to break subtly after a suspend/resume cycle. The intention of that patch was to retain the file permissions of the cpufreq related sysfs files across suspend/resume. To achieve that, the commit completely removed the calls to cpufreq_add_dev() and __cpufreq_remove_dev() during suspend/resume transitions. But the problem is that those functions do 2 kinds of things: 1. Low-level initialization/tear-down that are critical to the correct functioning of cpufreq-core. 2. Kobject and sysfs related initialization/teardown. Ideally we should have reorganized the code to cleanly separate these two responsibilities, and skipped only the sysfs related parts during suspend/resume. Since we skipped the entire callbacks instead (which also included some CPU and cpufreq-specific critical components), cpufreq subsystem started behaving erratically after suspend/resume. So revert the commit to fix the regression. We'll revisit and address the original goal of that commit separately, since it involves quite a bit of careful code reorganization and appears to be non-trivial. (While reverting the commit, note that another commit `f51e1eb` (cpufreq: Fix cpufreq regression after suspend/resume) already reverted part of the original set of changes. So revert only the remaining ones). Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: Paul Bolle <pebolle@tiscali.nl> Cc: 3.10+ <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-07-15 01:31:36 +02:00
Viresh Kumar	266c13d767	cpufreq: Fix serialization of frequency transitions Commit 7c30ed ("cpufreq: make sure frequency transitions are serialized") interacts poorly with systems that have a single core freqency for all cores. On such systems we have a single policy for all cores with several CPUs. When we do a frequency transition the governor calls the pre and post change notifiers which causes cpufreq_notify_transition() per CPU. Since the policy is the same for all of them all CPUs after the first and the warnings added are generated by checking a per-policy flag the warnings will be triggered for all cores after the first. Fix this by allowing notifier to be called for n times. Where n is the number of cpus in policy->cpus. Reported-and-tested-by: Mark Brown <broonie@linaro.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-07-04 13:12:44 +02:00
Lan Tianyu	f4fd379784	acpi-cpufreq: Add new sysfs attribute freqdomain_cpus Commits `fcf8058` (cpufreq: Simplify cpufreq_add_dev()) and `aa77a52` (cpufreq: acpi-cpufreq: Don't set policy->related_cpus from .init()) changed the contents of the "related_cpus" sysfs attribute on systems where acpi-cpufreq is used and user space can't get the list of CPUs which are in the same hardware coordination CPU domain (provided by the ACPI AML method _PSD) via "related_cpus" any more. To make up for that loss add a new sysfs attribute "freqdomian_cpus" for the acpi-cpufreq driver which exposes the list of CPUs in the same domain regardless of whether it is coordinated by hardware or software. [rjw: Changelog, documentation] References: https://bugzilla.kernel.org/show_bug.cgi?id=58761 Reported-by: Jean-Philippe Halimi <jean-philippe.halimi@exascale-computing.eu> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-06-27 21:51:09 +02:00
Viresh Kumar	7c30ed532c	cpufreq: make sure frequency transitions are serialized Whenever we are changing frequency of a cpu, we are calling PRECHANGE and POSTCHANGE notifiers. They must be serialized. i.e. PRECHANGE or POSTCHANGE shouldn't be called twice contiguously. This can happen due to bugs in users of __cpufreq_driver_target() or actual cpufreq drivers who are sending these notifiers. This patch adds some protection against this. Now, we keep track of the last transaction and see if something went wrong. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-06-27 21:49:55 +02:00
Viresh Kumar	0956df9c84	cpufreq: make __cpufreq_notify_transition() static __cpufreq_notify_transition() is used only in cpufreq.c, make it static. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-06-21 01:08:16 +02:00
Viresh Kumar	bb176f7d03	cpufreq: Fix minor formatting issues There were a few noticeable formatting issues in core cpufreq code. This cleans them up to make code look better. The changes include: - Whitespace cleanup. - Rearrangements of code. - Multiline comments fixes. - Formatting changes to fit 80 columns. Copyright information in cpufreq.c is also updated to include my name for 2013. [rjw: Changelog] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-06-21 01:06:34 +02:00
Xiaoguang Chen	95731ebb11	cpufreq: Fix governor start/stop race condition Cpufreq governors' stop and start operations should be carried out in sequence. Otherwise, there will be unexpected behavior, like in the example below. Suppose there are 4 CPUs and policy->cpu=CPU0, CPU1/2/3 are linked to CPU0. The normal sequence is: 1) Current governor is userspace. An application tries to set the governor to ondemand. It will call __cpufreq_set_policy() in which it will stop the userspace governor and then start the ondemand governor. 2) Current governor is userspace. The online of CPU3 runs on CPU0. It will call cpufreq_add_policy_cpu() in which it will first stop the userspace governor, and then start it again. If the sequence of the above two cases interleaves, it becomes: 1) Application stops userspace governor 2) Hotplug stops userspace governor which is a problem, because the governor shouldn't be stopped twice in a row. What happens next is: 3) Application starts ondemand governor 4) Hotplug starts a governor In step 4, the hotplug is supposed to start the userspace governor, but now the governor has been changed by the application to ondemand, so the ondemand governor is started once again, which is incorrect. The solution is to prevent policy governors from being stopped multiple times in a row. A governor should only be stopped once for one policy. After it has been stopped, no more governor stop operations should be executed. Also add a mutex to serialize governor operations. [rjw: Changelog. And you owe me a beverage of my choice.] Signed-off-by: Xiaoguang Chen <chenxg@marvell.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-06-21 00:56:04 +02:00
Viresh Kumar	a262e94cdc	cpufreq: remove unnecessary cpufreq_cpu_{get\|put}() calls struct cpufreq_policy is already passed as argument to some routines like: __cpufreq_driver_getavg() and so we don't really need to do cpufreq_cpu_get() before and cpufreq_cpu_put() in them to get a policy structure. Remove them. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-06-04 14:26:27 +02:00
Viresh Kumar	2361be2366	cpufreq: Don't create empty /sys/devices/system/cpu/cpufreq directory When we don't have any file in cpu/cpufreq directory we shouldn't create it. Specially with the introduction of per-policy governor instance patchset, even governors are moved to cpu/cpu*/cpufreq/governor-name directory and so this directory is just not required. Lets have it only when required. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-05-27 13:24:02 +02:00
Viresh Kumar	72a4ce340a	cpufreq: Move get_cpu_idle_time() to cpufreq.c Governors other than ondemand and conservative can also use get_cpu_idle_time() and they aren't required to compile cpufreq_governor.c. So, move these independent routines to cpufreq.c instead. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-05-27 13:20:56 +02:00
Viresh Kumar	944e9a0316	cpufreq: governors: Move get_governor_parent_kobj() to cpufreq.c get_governor_parent_kobj() can be used by any governor, generic cpufreq governors or platform specific ones and so must be present in cpufreq.c instead of cpufreq_governor.c. This patch moves it to cpufreq.c. This also adds EXPORT_SYMBOL_GPL(get_governor_parent_kobj) so that modules can use this function too. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-05-27 13:20:56 +02:00
Viresh Kumar	3f869d6d41	cpufreq: Add EXPORT_SYMBOL_GPL for have_governor_per_policy This patch adds: EXPORT_SYMBOL_GPL(have_governor_per_policy), so that this routine can be used by modules too. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-05-27 13:20:55 +02:00
Viresh Kumar	955ef48335	cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT With the rwsem lock around __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT), we get circular dependency when we call sysfs_remove_group(). ====================================================== [ INFO: possible circular locking dependency detected ] 3.9.0-rc7+ #15 Not tainted ------------------------------------------------------- cat/2387 is trying to acquire lock: (&per_cpu(cpu_policy_rwsem, cpu)){+++++.}, at: [<c02f6179>] lock_policy_rwsem_read+0x25/0x34 but task is already holding lock: (s_active#41){++++.+}, at: [<c00f9bf7>] sysfs_read_file+0x4f/0xcc which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (s_active#41){++++.+}: [<c0055a79>] lock_acquire+0x61/0xbc [<c00fabf1>] sysfs_addrm_finish+0xc1/0x128 [<c00f9819>] sysfs_hash_and_remove+0x35/0x64 [<c00fbe6f>] remove_files.isra.0+0x1b/0x24 [<c00fbea5>] sysfs_remove_group+0x2d/0xa8 [<c02f9a0b>] cpufreq_governor_interactive+0x13b/0x35c [<c02f61df>] __cpufreq_governor+0x2b/0x8c [<c02f6579>] __cpufreq_set_policy+0xa9/0xf8 [<c02f6b75>] store_scaling_governor+0x61/0x100 [<c02f6f4d>] store+0x39/0x60 [<c00f9b81>] sysfs_write_file+0xed/0x114 [<c00b3fd1>] vfs_write+0x65/0xd8 [<c00b424b>] sys_write+0x2f/0x50 [<c000cdc1>] ret_fast_syscall+0x1/0x52 -> #0 (&per_cpu(cpu_policy_rwsem, cpu)){+++++.}: [<c0055253>] __lock_acquire+0xef3/0x13dc [<c0055a79>] lock_acquire+0x61/0xbc [<c03ee1f5>] down_read+0x25/0x30 [<c02f6179>] lock_policy_rwsem_read+0x25/0x34 [<c02f6edd>] show+0x21/0x58 [<c00f9c0f>] sysfs_read_file+0x67/0xcc [<c00b40a7>] vfs_read+0x63/0xd8 [<c00b41fb>] sys_read+0x2f/0x50 [<c000cdc1>] ret_fast_syscall+0x1/0x52 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(s_active#41); lock(&per_cpu(cpu_policy_rwsem, cpu)); lock(s_active#41); lock(&per_cpu(cpu_policy_rwsem, cpu)); * DEADLOCK * 2 locks held by cat/2387: #0: (&buffer->mutex){+.+.+.}, at: [<c00f9bcd>] sysfs_read_file+0x25/0xcc #1: (s_active#41){++++.+}, at: [<c00f9bf7>] sysfs_read_file+0x4f/0xcc stack backtrace: [<c0011d55>] (unwind_backtrace+0x1/0x9c) from [<c03e9a09>] (print_circular_bug+0x19d/0x1e8) [<c03e9a09>] (print_circular_bug+0x19d/0x1e8) from [<c0055253>] (__lock_acquire+0xef3/0x13dc) [<c0055253>] (__lock_acquire+0xef3/0x13dc) from [<c0055a79>] (lock_acquire+0x61/0xbc) [<c0055a79>] (lock_acquire+0x61/0xbc) from [<c03ee1f5>] (down_read+0x25/0x30) [<c03ee1f5>] (down_read+0x25/0x30) from [<c02f6179>] (lock_policy_rwsem_read+0x25/0x34) [<c02f6179>] (lock_policy_rwsem_read+0x25/0x34) from [<c02f6edd>] (show+0x21/0x58) [<c02f6edd>] (show+0x21/0x58) from [<c00f9c0f>] (sysfs_read_file+0x67/0xcc) [<c00f9c0f>] (sysfs_read_file+0x67/0xcc) from [<c00b40a7>] (vfs_read+0x63/0xd8) [<c00b40a7>] (vfs_read+0x63/0xd8) from [<c00b41fb>] (sys_read+0x2f/0x50) [<c00b41fb>] (sys_read+0x2f/0x50) from [<c000cdc1>] (ret_fast_syscall+0x1/0x52) This lock isn't required while calling __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT). Remove it. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-05-22 00:23:54 +02:00
Srivatsa S. Bhat	a66b2e503f	cpufreq: Preserve sysfs files across suspend/resume The file permissions of cpufreq per-cpu sysfs files are not preserved across suspend/resume because we internally go through the CPU Hotplug path which reinitializes the file permissions on CPU online. But the user is not supposed to know that we are using CPU hotplug internally within suspend/resume (IOW, the kernel should not silently wreck the user-set file permissions across a suspend cycle). Therefore, we need to preserve the file permissions as they are across suspend/resume. The simplest way to achieve that is to just not touch the sysfs files at all - ie., just ignore the CPU hotplug notifications in the suspend/resume path (_FROZEN) in the cpufreq hotplug callback. Reported-by: Robert Jarzmik <robert.jarzmik@intel.com> Reported-by: Durgadoss R <durgadoss.r@intel.com> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-05-15 21:47:17 +02:00
Viresh Kumar	d96038e0fa	cpufreq: Issue CPUFREQ_GOV_POLICY_EXIT notifier before dropping policy refcount We must call __cpufreq_governor(data, CPUFREQ_GOV_POLICY_EXIT) before calling cpufreq_cpu_put(data), so that policy kobject have valid fields. Otherwise, removing last online cpu of policy->cpus causes this crash for ondemand/conservative governor. [<c00fb076>] (sysfs_find_dirent+0xe/0xa8) from [<c00fb1bd>] (sysfs_get_dirent+0x21/0x58) [<c00fb1bd>] (sysfs_get_dirent+0x21/0x58) from [<c00fc259>] (sysfs_remove_group+0x85/0xbc) [<c00fc259>] (sysfs_remove_group+0x85/0xbc) from [<c02faad9>] (cpufreq_governor_dbs+0x369/0x4a0) [<c02faad9>] (cpufreq_governor_dbs+0x369/0x4a0) from [<c02f66d7>] (__cpufreq_governor+0x2b/0x8c) [<c02f66d7>] (__cpufreq_governor+0x2b/0x8c) from [<c02f6893>] (__cpufreq_remove_dev.isra.12+0x15b/0x250) [<c02f6893>] (__cpufreq_remove_dev.isra.12+0x15b/0x250) from [<c03e91c7>] (cpufreq_cpu_callback+0x2f/0x3c) [<c03e91c7>] (cpufreq_cpu_callback+0x2f/0x3c) from [<c0036fe1>] (notifier_call_chain+0x45/0x54) [<c0036fe1>] (notifier_call_chain+0x45/0x54) from [<c001e611>] (__cpu_notify+0x1d/0x34) [<c001e611>] (__cpu_notify+0x1d/0x34) from [<c03e5833>] (_cpu_down+0x63/0x1ac) [<c03e5833>] (_cpu_down+0x63/0x1ac) from [<c03e5997>] (cpu_down+0x1b/0x30) [<c03e5997>] (cpu_down+0x1b/0x30) from [<c03e60eb>] (store_online+0x27/0x54) [<c03e60eb>] (store_online+0x27/0x54) from [<c0295629>] (dev_attr_store+0x11/0x18) [<c0295629>] (dev_attr_store+0x11/0x18) from [<c00f9edd>] (sysfs_write_file+0xed/0x114) [<c00f9edd>] (sysfs_write_file+0xed/0x114) from [<c00b42a9>] (vfs_write+0x65/0xd8) [<c00b42a9>] (vfs_write+0x65/0xd8) from [<c00b4523>] (sys_write+0x2f/0x50) [<c00b4523>] (sys_write+0x2f/0x50) from [<c000cdc1>] (ret_fast_syscall+0x1/0x52) Of course this only impacted drivers which have have_governor_per_policy set to true. i.e. big LITTLE cpufreq driver. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-05-12 14:04:16 +02:00
Rafael J. Wysocki	1c3d85dd4e	cpufreq: Revert incorrect commit `5800043` Commit `5800043` (cpufreq: convert cpufreq_driver to using RCU) causes the following call trace to be spit on boot: BUG: sleeping function called from invalid context at /scratch/rafael/work/linux-pm/mm/slab.c:3179 in_atomic(): 0, irqs_disabled(): 0, pid: 292, name: systemd-udevd 2 locks held by systemd-udevd/292: #0: (subsys mutex){+.+.+.}, at: [<ffffffff8146851a>] subsys_interface_register+0x4a/0xe0 #1: (rcu_read_lock){.+.+.+}, at: [<ffffffff81538210>] cpufreq_add_dev_interface+0x60/0x5e0 Pid: 292, comm: systemd-udevd Not tainted 3.9.0-rc8+ #323 Call Trace: [<ffffffff81072c90>] __might_sleep+0x140/0x1f0 [<ffffffff811581c2>] kmem_cache_alloc+0x42/0x2b0 [<ffffffff811e7179>] sysfs_new_dirent+0x59/0x130 [<ffffffff811e63cb>] sysfs_add_file_mode+0x6b/0x110 [<ffffffff81538210>] ? cpufreq_add_dev_interface+0x60/0x5e0 [<ffffffff810a3254>] ? __lock_is_held+0x54/0x80 [<ffffffff811e647d>] sysfs_add_file+0xd/0x10 [<ffffffff811e6541>] sysfs_create_file+0x21/0x30 [<ffffffff81538280>] cpufreq_add_dev_interface+0xd0/0x5e0 [<ffffffff81538210>] ? cpufreq_add_dev_interface+0x60/0x5e0 [<ffffffffa000337f>] ? acpi_processor_get_platform_limit+0x32/0xbb [processor] [<ffffffffa022f540>] ? do_drv_write+0x70/0x70 [acpi_cpufreq] [<ffffffff810a3254>] ? __lock_is_held+0x54/0x80 [<ffffffff8106c97e>] ? up_read+0x1e/0x40 [<ffffffff8106e632>] ? __blocking_notifier_call_chain+0x72/0xc0 [<ffffffff81538dbd>] cpufreq_add_dev+0x62d/0xae0 [<ffffffff815389b8>] ? cpufreq_add_dev+0x228/0xae0 [<ffffffff81468569>] subsys_interface_register+0x99/0xe0 [<ffffffffa014d000>] ? 0xffffffffa014cfff [<ffffffff81535d5d>] cpufreq_register_driver+0x9d/0x200 [<ffffffffa014d000>] ? 0xffffffffa014cfff [<ffffffffa014d0e9>] acpi_cpufreq_init+0xe9/0x1000 [acpi_cpufreq] [<ffffffff810002fa>] do_one_initcall+0x11a/0x170 [<ffffffff810b4b87>] load_module+0x1cf7/0x2920 [<ffffffff81322580>] ? ddebug_proc_open+0xb0/0xb0 [<ffffffff816baee0>] ? retint_restore_args+0xe/0xe [<ffffffff810b5887>] sys_init_module+0xd7/0x120 [<ffffffff816bb6d2>] system_call_fastpath+0x16/0x1b which is quite obvious, because that commit put (multiple instances of) sysfs_create_file() under rcu_read_lock()/rcu_read_unlock(), although sysfs_create_file() may cause memory to be allocated with GFP_KERNEL and that may sleep, which is not permitted in RCU read critical section. Revert the buggy commit altogether along with some changes on top of it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-04-29 00:08:16 +02:00
Viresh Kumar	820c6ca293	cpufreq: Don't call __cpufreq_governor() for drivers without target() Some cpufreq drivers implement their own governor and so don't need us to call generic governors interface via __cpufreq_governor(). Few recent commits haven't obeyed this law well and we saw some regressions. This patch is an attempt to fix the above issue. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reported-and-tested-by: Sedat Dilek <sedat.dilek@gmail.com> Tested-by: Dirk Brandewie <dirk.brandewie@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-04-22 00:48:03 +02:00
Viresh Kumar	e4969ebac8	cpufreq: Call __cpufreq_governor() with correct policy->cpus mask __cpufreq_governor() must be called with a correct policy->cpus mask. In __cpufreq_remove_dev() we initially clear policy->cpus with cpumask_clear_cpu() and then call __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT). If the governor is doing some per-cpu stuff in EXIT callback, this can create uncertain behavior. Generic governors in drivers/cpufreq/ doesn't do any per-cpu stuff in EXIT callback and so we don't face any issues currently. But its better to keep the code clean, so we don't face any issues in future. Now, we call cpumask_clear_cpu() only when multiple cpus are managed by policy. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-04-11 22:50:09 +02:00
Nathan Zimmer	5800043b24	cpufreq: convert cpufreq_driver to using RCU We eventually would like to remove the rwlock cpufreq_driver_lock or convert it back to a spinlock and protect the read sections with RCU. The first step in that direction is to make cpufreq_driver use RCU. I don't see an easy wasy to protect the cpufreq_cpu_data structure with RCU, so I am leaving it with the rwlock for now since under certain configs __cpufreq_cpu_get is a hot spot with 256+ cores. [rjw: Subject, changelog, white space] Signed-off-by: Nathan Zimmer <nzimmer@sgi.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-04-10 13:19:26 +02:00
Viresh Kumar	b43a7ffbf3	cpufreq: Notify all policy->cpus in cpufreq_notify_transition() policy->cpus contains all online cpus that have single shared clock line. And their frequencies are always updated together. Many SMP system's cpufreq drivers take care of this in individual drivers but the best place for this code is in cpufreq core. This patch modifies cpufreq_notify_transition() to notify frequency change for all cpus in policy->cpus and hence updates all users of this API. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Stephen Warren <swarren@nvidia.com> Tested-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-04-02 15:24:00 +02:00
Viresh Kumar	4d5dcc4211	cpufreq: governor: Implement per policy instances of governors Currently, there can't be multiple instances of single governor_type. If we have a multi-package system, where we have multiple instances of struct policy (per package), we can't have multiple instances of same governor. i.e. We can't have multiple instances of ondemand governor for multiple packages. Governors directory in sysfs is created at /sys/devices/system/cpu/cpufreq/ governor-name/. Which again reflects that there can be only one instance of a governor_type in the system. This is a bottleneck for multicluster system, where we want different packages to use same governor type, but with different tunables. This patch uses the infrastructure provided by earlier patch and implements init/exit routines for ondemand and conservative governors. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-04-01 01:11:34 +02:00
Viresh Kumar	7bd353a995	cpufreq: Add per policy governor-init/exit infrastructure Currently, there can't be multiple instances of single governor_type. If we have a multi-package system, where we have multiple instances of struct policy (per package), we can't have multiple instances of same governor. i.e. We can't have multiple instances of ondemand governor for multiple packages. Governors directory in sysfs is created at /sys/devices/system/cpu/cpufreq/ governor-name/. Which again reflects that there can be only one instance of a governor_type in the system. This is a bottleneck for multicluster system, where we want different packages to use same governor type, but with different tunables. This patch is inclined towards providing this infrastructure. Because we are required to allocate governor's resources dynamically now, we must do it at policy creation and end. And so got CPUFREQ_GOV_POLICY_INIT/EXIT. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-04-01 01:11:34 +02:00
Nathan Zimmer	0d1857a1b9	cpufreq: Convert the cpufreq_driver_lock to a rwlock This eliminates the contention I am seeing in __cpufreq_cpu_get. It also nicely stages the lock to be replaced by the rcu. Signed-off-by: Nathan Zimmer <nzimmer@sgi.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-04-01 01:11:34 +02:00
Rafael J. Wysocki	4419fbd4b4	Merge branch 'pm-cpufreq' * pm-cpufreq: (55 commits) cpufreq / intel_pstate: Fix 32 bit build cpufreq: conservative: Fix typos in comments cpufreq: ondemand: Fix typos in comments cpufreq: exynos: simplify .init() for setting policy->cpus cpufreq: kirkwood: Add a cpufreq driver for Marvell Kirkwood SoCs cpufreq/x86: Add P-state driver for sandy bridge. cpufreq_stats: do not remove sysfs files if frequency table is not present cpufreq: Do not track governor name for scaling drivers with internal governors. cpufreq: Only call cpufreq_out_of_sync() for driver that implement cpufreq_driver.target() cpufreq: Retrieve current frequency from scaling drivers with internal governors cpufreq: Fix locking issues cpufreq: Create a macro for unlock_policy_rwsem{read,write} cpufreq: Remove unused HOTPLUG_CPU code cpufreq: governors: Fix WARN_ON() for multi-policy platforms cpufreq: ondemand: Replace down_differential tuner with adj_up_threshold cpufreq / stats: Get rid of CPUFREQ_STATDEVICE_ATTR cpufreq: Don't check cpu_online(policy->cpu) cpufreq: add imx6q-cpufreq driver cpufreq: Don't remove sysfs link for policy->cpu cpufreq: Remove unnecessary use of policy->shared_type ...	2013-02-15 13:59:07 +01:00
Dirk Brandewie	fa69e33f7d	cpufreq: Do not track governor name for scaling drivers with internal governors. Scaling drivers that implement internal governors do not have governor structures assocaited with them. Only track the name of the governor associated with the CPU if the driver does not implement cpufreq_driver.setpolicy() Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-02-09 12:55:53 +01:00
Dirk Brandewie	f6b0515b07	cpufreq: Only call cpufreq_out_of_sync() for driver that implement cpufreq_driver.target() Scaling drivers that implement cpufreq_driver.setpolicy() have internal governors that do not signal changes via cpufreq_notify_transition() so the frequncy in the policy will almost certainly be different than the current frequncy. Only call cpufreq_out_of_sync() when the underlying driver implements cpufreq_driver.target() Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-02-09 12:55:47 +01:00
Dirk Brandewie	9e21ba8bd8	cpufreq: Retrieve current frequency from scaling drivers with internal governors Scaling drivers that implement the cpufreq_driver.setpolicy() versus the cpufreq_driver.target() interface do not set policy->cur. Normally policy->cur is set during the call to cpufreq_driver.target() when the frequnecy request is made by the governor. If the scaling driver implements cpufreq_driver.setpolicy() and cpufreq_driver.get() interfaces use cpufreq_driver.get() to retrieve the current frequency. Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-02-09 12:55:03 +01:00
Viresh Kumar	2eaa3e2df1	cpufreq: Fix locking issues cpufreq core uses two locks: - cpufreq_driver_lock: General lock for driver and cpufreq_cpu_data array. - cpu_policy_rwsemfix locking: per CPU reader-writer semaphore designed to cure all cpufreq/hotplug/workqueue/etc related lock issues. These locks were not used properly and are placed against their principle (present before their definition) at various places. This patch is an attempt to fix their use. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-02-09 01:22:57 +01:00
Viresh Kumar	fa1d8af47f	cpufreq: Create a macro for unlock_policy_rwsem{read,write} On the lines of macro: lock_policy_rwsem, we can create another macro for unlock_policy_rwsem. Lets do it. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-02-09 01:22:06 +01:00
Viresh Kumar	65922465b5	cpufreq: Remove unused HOTPLUG_CPU code Because the sibling cpu of any online cpu is identified very early in cpufreq_add_dev(), below code is never executed. And so can be removed. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-02-09 01:21:37 +01:00
Viresh Kumar	8e53695f7f	cpufreq: governors: Fix WARN_ON() for multi-policy platforms On multi-policy systems there is a single instance of governor for both the policies (if same governor is chosen for both policies). With the code update from following patches: `8eeed09` cpufreq: governors: Get rid of dbs_data->enable field `b394058` cpufreq: governors: Reset tunables only for cpufreq_unregister_governor() We are creating/removing sysfs directory of governor for for every call to GOV_START and STOP. This would fail for multi-policy system as there is a per-policy call to START/STOP. This patch reuses the governor->initialized variable to detect total users of governor. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-02-09 01:21:13 +01:00
Viresh Kumar	3361b7b173	cpufreq: Don't check cpu_online(policy->cpu) policy->cpu or cpus in policy->cpus can't be offline anymore. And so we don't need to check if they are online or not. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-02-09 01:18:34 +01:00
Viresh Kumar	73bf0fc2b0	cpufreq: Don't remove sysfs link for policy->cpu "cpufreq" directory in policy->cpu is never created using sysfs_create_link(), but using kobject_init_and_add(). And so we shouldn't call sysfs_remove_link() for policy->cpu(). sysfs stuff for policy->cpu is automatically removed when we call kobject_put() for dying policy. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: Dirk Brandewie <dirk.brandewie@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-02-05 22:21:14 +01:00
Viresh Kumar	b394058f06	cpufreq: governors: Reset tunables only for cpufreq_unregister_governor() Currently, whenever governor->governor() is called for CPUFRREQ_GOV_START event we reset few tunables of governor. Which isn't correct, as this routine is called for every cpu hot-[un]plugging event. We should actually be resetting these only when the governor module is removed and re-installed. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-02-02 01:29:31 +01:00
Viresh Kumar	fcf8058296	cpufreq: Simplify cpufreq_add_dev() Currently cpufreq_add_dev() firsts allocates policy, calls driver->init() and then checks if this CPU is already managed or not. And if it is already managed, its policy is freed. We can save all this if we somehow know that CPU is managed or not in advance. policy->related_cpus contains the list of all valid sibling CPUs of policy->cpu. We can check this to see if the current CPU is already managed. From now on, platforms don't really need to set related_cpus from their init() routines, as the same work is done by core too. If a platform driver needs to set the related_cpus mask with some additional CPUs, other than CPUs present in policy->cpus, they are free to do it, though, as we don't override anything. [rjw: Changelog] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-02-02 00:01:16 +01:00
Viresh Kumar	b26f72042e	cpufreq: Revert "cpufreq: Don't use cpu removed during cpufreq_driver_unregister" This reverts commit 956f339 "cpufreq: Don't use cpu removed during cpufreq_driver_unregister". With the addition of the following commit, this change/variable is not required any more: commit b9ba2725343ae57add3f324dfa5074167f48de96 Author: Viresh Kumar <viresh.kumar@linaro.org> Date: Mon Jan 14 13:23:03 2013 +0000 cpufreq: Simplify __cpufreq_remove_dev() [rjw: Subject and changelog] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-02-02 00:01:16 +01:00
Borislav Petkov	9d95046e5d	cpufreq: Add a get_current_driver helper Add a helper function to return cpufreq_driver->name. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-02-02 00:01:15 +01:00
Dirk Brandewie	d5aaffa9dd	cpufreq: handle cpufreq being disabled for all exported function. When disable_cpufreq() is called some exported functions are still being used that do not have a check for cpufreq being disabled. Add a disabled check into cpufreq_cpu_get() to return NULL if cpufreq is disabled this covers most of the exported functions. For the exported functions that do not call cpufreq_cpu_get() add an explicit check. Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-02-02 00:01:14 +01:00
Viresh Kumar	b8eed8af94	cpufreq: Simplify __cpufreq_remove_dev() __cpufreq_remove_dev() is called on multiple occasions: cpufreq_driver unregister and cpu removals. Current implementation of this routine is overly complex without much need. If the cpu to be removed is the policy->cpu, we remove the policy first and add all other cpus again from policy->cpus and then finally call __cpufreq_remove_dev() again to remove the cpu to be deleted. Haahhhh.. There exist a simple solution to removal of a cpu: - Simply use the old policy structure - update its fields like: policy->cpu, etc. - notify any users of cpufreq, which depend on changing policy->cpu Hence this patch, which tries to implement the above theory. It is tested well by myself on ARM big.LITTLE TC2 SoC, which has 5 cores (2 A15 and 3 A7). Both A15's share same struct policy and all A7's share same policy structure. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-02-02 00:01:14 +01:00
Viresh Kumar	6954ca9c8b	cpufreq: Don't use cpu removed during cpufreq_driver_unregister This is how the core works: cpufreq_driver_unregister() - subsys_interface_unregister() - for_each_cpu() call cpufreq_remove_dev(), i.e. 0,1,2,3,4 when we unregister. cpufreq_remove_dev(): - Remove policy node - Call cpufreq_add_dev() for next cpu, sharing mask with removed cpu. i.e. When cpu 0 is removed, we call it for cpu 1. And when called for cpu 2, we call it for cpu 3. - cpufreq_add_dev() would call cpufreq_driver->init() - init would return mask as AND of 2, 3 and 4 for cluster A7. - cpufreq core would do online_cpu && policy->cpus Here is the BUG(). Because cpu hasn't died but we have just unregistered the cpufreq driver, online cpu would still have cpu 2 in it. And so thing go bad again. Solution: Keep cpumask of cpus that are registered with cpufreq core and clear cpus when we get a call from subsys_interface_unregister() via cpufreq_remove_dev(). Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-02-02 00:01:14 +01:00
Viresh Kumar	f6a7409cab	cpufreq: Notify governors when cpus are hot-[un]plugged Because cpufreq core and governors worry only about the online cpus, if a cpu is hot [un]plugged, we must notify governors about it, otherwise be ready to expect something unexpected. We already have notifiers in the form of CPUFREQ_GOV_START/CPUFREQ_GOV_STOP, we just need to call them now. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-02-02 00:01:14 +01:00

1 2 3 4 5 ...

414 Commits (c82bd44437f5d53d1654d9e36a9e4e55610f6624)