Commit graph

988830 commits

Author SHA1 Message Date
Wong Vee Khee 20e07e2c3c net: stmmac: Add PCI bus info to ethtool driver query output
This patch populates the PCI bus info in the ethtool driver query data.

Users will be able to view PCI bus info using 'ethtool -i <interface>'.

Signed-off-by: Wong Vee Khee <vee.khee.wong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-17 14:25:50 -08:00
David S. Miller 41f1a01bb6 Merge branch 'ptp-ptp_clockmatrix-Fix-output-1-PPS-alignment'
Vincent Cheng says:

====================
ptp: ptp_clockmatrix: Fix output 1 PPS alignment.

This series fixes a race condition that may result in the output clock
not aligned to internal 1 PPS clock.

Part of device initialization is to align the rising edge of output
clocks to the internal rising edge of the 1 PPS clock.  If the system
APLL and DPLL are not locked when this alignment occurs, the alignment
fails and a fixed offset between the internal 1 PPS clock and the
output clock occurs.

If a clock is dynamically enabled after power-up, the output clock
also needs to be aligned to the internal 1 PPS clock.

v3:
Suggested by: Jakub Kicinski <kuba@kernel.org>
- Remove unnecessary 'err' variable
- Increase msleep()/loop accuracy by using jiffies in while()
- No empty lines between variables
- No empty lines between call and the if
- parenthesis around a == b are unnecessary
- Inconsistent \n usage in dev_()
- Remove unnecessary empty line
- Leave string format in place so static code checkers can
  validate arguments

v2:
Suggested by: Richard Cochran <richardcochran@gmail.com>
- Added const to "char * fmt"
- Break unrelated header change into separate patch
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-17 13:49:26 -08:00
Vincent Cheng 77fdb168a3 ptp: ptp_clockmatrix: clean-up - parenthesis around a == b are unnecessary
Code clean-up.

Signed-off-by: Vincent Cheng <vincent.cheng.xh@renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-17 13:49:26 -08:00
Vincent Cheng fde3b3a706 ptp: ptp_clockmatrix: Simplify code - remove unnecessary err variable.
Code clean-up.

Signed-off-by: Vincent Cheng <vincent.cheng.xh@renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-17 13:49:26 -08:00
Vincent Cheng fcfd37573a ptp: ptp_clockmatrix: Coding style - tighten vertical spacing.
Code clean-up.

* Remove blank line between variable declarations.
* Remove blank line between:
	err = blah(...)

	if (err)
		...
* Remove unnecessary blank line before/after loop constructs.

Signed-off-by: Vincent Cheng <vincent.cheng.xh@renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-17 13:49:26 -08:00
Vincent Cheng 1c49d3e947 ptp: ptp_clockmatrix: Clean-up dev_*() messages.
Code clean-up.

* Remove unnecessary \n termination from dev_*() messages.
* Remove 'char *fmt' to define strings to stay within 80 column
  limit.  Not needed since coding guidelines increased to
  100 columns limit.
  Keeping format in place allows static code checkers to
  validate the arguments.
* Tighten up vertical spacing.

Signed-off-by: Vincent Cheng <vincent.cheng.xh@renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-17 13:49:26 -08:00
Vincent Cheng 10c270cf25 ptp: ptp_clockmatrix: Remove unused header declarations.
Removed unused header declarations.

Signed-off-by: Vincent Cheng <vincent.cheng.xh@renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-17 13:49:25 -08:00
Vincent Cheng e8b4d8b542 ptp: ptp_clockmatrix: Add alignment of 1 PPS to idtcm_perout_enable.
When enabling output using PTP_CLK_REQ_PEROUT, need to align the output
clock to the internal 1 PPS clock.

Signed-off-by: Vincent Cheng <vincent.cheng.xh@renesas.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-17 13:49:25 -08:00
Vincent Cheng 797d318654 ptp: ptp_clockmatrix: Add wait_for_sys_apll_dpll_lock.
Part of the device initialization aligns the rising edge of the output
clock to the internal 1 PPS clock. If the system APLL and DPLL is not
locked, then the alignment will fail and there will be a fixed offset
between the internal 1 PPS clock and the output clock.

After loading the device firmware, poll the system APLL and DPLL for
locked state prior to initialization, timing out after 2 seconds.

Signed-off-by: Vincent Cheng <vincent.cheng.xh@renesas.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-17 13:49:25 -08:00
David S. Miller 8574908073 Merge branch 'ddwmac-sun8i-cleanup-and-shutdown-hook'
Samuel Holland says:

====================
dwmac-sun8i cleanup and shutdown hook

These patches clean up some things I noticed while fixing suspend/resume
behavior. The first four are minor code improvements. The last one adds
a shutdown hook to minimize power consumption on boards without a PMIC.

Changes v1 to v2:
  - Note the assumption of exclusive reset controller access in patch 3
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-17 13:42:56 -08:00
Samuel Holland 96be41d74f net: stmmac: dwmac-sun8i: Add a shutdown callback
The Ethernet MAC and PHY are usually major consumers of power on boards
which may not be able to fully power off (those with no PMIC). Powering
down the MAC and internal PHY saves power while these boards are "off".

Reviewed-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Samuel Holland <samuel@sholland.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-17 13:42:56 -08:00
Samuel Holland 2743aa2450 net: stmmac: dwmac-sun8i: Minor probe function cleanup
Adjust the spacing and use an explicit "return 0" in the success path
to make the function easier to parse.

Reviewed-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Samuel Holland <samuel@sholland.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-17 13:42:56 -08:00
Samuel Holland 1c22f54696 net: stmmac: dwmac-sun8i: Use reset_control_reset
Use the appropriate function instead of reimplementing it,
and update the error message to match the code.

Reviewed-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Samuel Holland <samuel@sholland.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-17 13:42:56 -08:00
Samuel Holland afac1d34bf net: stmmac: dwmac-sun8i: Remove unnecessary PHY power check
sun8i_dwmac_unpower_internal_phy already checks if the PHY is powered,
so there is no need to do it again here.

Reviewed-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Samuel Holland <samuel@sholland.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-17 13:42:56 -08:00
Samuel Holland 557ef2dfb5 net: stmmac: dwmac-sun8i: Return void from PHY unpower
This is a deinitialization function that always returned zero, and that
return value was always ignored. Have it return void instead.

Reviewed-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Samuel Holland <samuel@sholland.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-17 13:42:56 -08:00
Heiner Kallweit 80a2a40bd2 r8169: use macro pm_ptr
Use macro pm_ptr(), this helps to avoid some ifdeffery.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-17 13:33:27 -08:00
David S. Miller 32511f8e49 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for net-next:

1) Add two helper functions to release one table and hooks from
   the netns and netlink event path.

2) Add table ownership infrastructure, this new infrastructure allows
   users to bind a table (and its content) to a process through the
   netlink socket.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-17 13:19:24 -08:00
Florian Fainelli 96313e1db8 net: mdio: Remove of_phy_attach()
We have no in-tree users, also update the sfp-phylink.rst documentation
to indicate that phy_attach_direct() is used instead of of_phy_attach().

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-17 13:17:49 -08:00
Vladimir Oltean 5975655565 net: mscc: ocelot: select PACKING in the Kconfig
Ocelot now uses include/linux/dsa/ocelot.h which makes use of
CONFIG_PACKING to pack/unpack bits into the Injection/Extraction Frame
Headers. So it needs to explicitly select it, otherwise there might be
build errors due to the missing dependency.

Fixes: 40d3f295b5 ("net: mscc: ocelot: use common tag parsing code with DSA")
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-17 13:16:29 -08:00
Peter Zijlstra c5e6fc08fe sched,x86: Allow !PREEMPT_DYNAMIC
Allow building x86 with PREEMPT_DYNAMIC=n, this is needed for
PREEMPT_RT as it makes no sense to not have full preemption on
PREEMPT_RT.

Fixes: 8c98e8cf723c ("preempt/dynamic: Provide preempt_schedule[_notrace]() static calls")
Reported-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Mike Galbraith <efault@gmx.de>
Link: https://lkml.kernel.org/r/YCK1+JyFNxQnWeXK@hirez.programming.kicks-ass.net
2021-02-17 14:12:43 +01:00
Frederic Weisbecker 4ae7dc97f7 entry/kvm: Explicitly flush pending rcuog wakeup before last rescheduling point
Following the idle loop model, cleanly check for pending rcuog wakeup
before the last rescheduling point upon resuming to guest mode. This
way we can avoid to do it from rcu_user_enter() with the last resort
self-IPI hack that enforces rescheduling.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20210131230548.32970-6-frederic@kernel.org
2021-02-17 14:12:43 +01:00
Frederic Weisbecker 47b8ff194c entry: Explicitly flush pending rcuog wakeup before last rescheduling point
Following the idle loop model, cleanly check for pending rcuog wakeup
before the last rescheduling point on resuming to user mode. This
way we can avoid to do it from rcu_user_enter() with the last resort
self-IPI hack that enforces rescheduling.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20210131230548.32970-5-frederic@kernel.org
2021-02-17 14:12:43 +01:00
Frederic Weisbecker f8bb5cae96 rcu/nocb: Trigger self-IPI on late deferred wake up before user resume
Entering RCU idle mode may cause a deferred wake up of an RCU NOCB_GP
kthread (rcuog) to be serviced.

Unfortunately the call to rcu_user_enter() is already past the last
rescheduling opportunity before we resume to userspace or to guest mode.
We may escape there with the woken task ignored.

The ultimate resort to fix every callsites is to trigger a self-IPI
(nohz_full depends on arch to implement arch_irq_work_raise()) that will
trigger a reschedule on IRQ tail or guest exit.

Eventually every site that want a saner treatment will need to carefully
place a call to rcu_nocb_flush_deferred_wakeup() before the last explicit
need_resched() check upon resume.

Fixes: 96d3fd0d31 (rcu: Break call_rcu() deadlock involving scheduler and perf)
Reported-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20210131230548.32970-4-frederic@kernel.org
2021-02-17 14:12:43 +01:00
Frederic Weisbecker 43789ef3f7 rcu/nocb: Perform deferred wake up before last idle's need_resched() check
Entering RCU idle mode may cause a deferred wake up of an RCU NOCB_GP
kthread (rcuog) to be serviced.

Usually a local wake up happening while running the idle task is handled
in one of the need_resched() checks carefully placed within the idle
loop that can break to the scheduler.

Unfortunately the call to rcu_idle_enter() is already beyond the last
generic need_resched() check and we may halt the CPU with a resched
request unhandled, leaving the task hanging.

Fix this with splitting the rcuog wakeup handling from rcu_idle_enter()
and place it before the last generic need_resched() check in the idle
loop. It is then assumed that no call to call_rcu() will be performed
after that in the idle loop until the CPU is put in low power mode.

Fixes: 96d3fd0d31 (rcu: Break call_rcu() deadlock involving scheduler and perf)
Reported-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20210131230548.32970-3-frederic@kernel.org
2021-02-17 14:12:43 +01:00
Frederic Weisbecker 54b7429eff rcu: Pull deferred rcuog wake up to rcu_eqs_enter() callers
Deferred wakeup of rcuog kthreads upon RCU idle mode entry is going to
be handled differently whether initiated by idle, user or guest. Prepare
with pulling that control up to rcu_eqs_enter() callers.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20210131230548.32970-2-frederic@kernel.org
2021-02-17 14:12:42 +01:00
Juri Lelli e0ee463c93 sched/features: Distinguish between NORMAL and DEADLINE hrtick
The HRTICK feature has traditionally been servicing configurations that
need precise preemptions point for NORMAL tasks. More recently, the
feature has been extended to also service DEADLINE tasks with stringent
runtime enforcement needs (e.g., runtime < 1ms with HZ=1000).

Enabling HRTICK sched feature currently enables the additional timer and
task tick for both classes, which might introduced undesired overhead
for no additional benefit if one needed it only for one of the cases.

Separate HRTICK sched feature in two (and leave the traditional case
name unmodified) so that it can be selectively enabled when needed.

With:

  $ echo HRTICK > /sys/kernel/debug/sched_features

the NORMAL/fair hrtick gets enabled.

With:

  $ echo HRTICK_DL > /sys/kernel/debug/sched_features

the DEADLINE hrtick gets enabled.

Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210208073554.14629-3-juri.lelli@redhat.com
2021-02-17 14:12:42 +01:00
Juri Lelli 156ec6f42b sched/features: Fix hrtick reprogramming
Hung tasks and RCU stall cases were reported on systems which were not
100% busy. Investigation of such unexpected cases (no sign of potential
starvation caused by tasks hogging the system) pointed out that the
periodic sched tick timer wasn't serviced anymore after a certain point
and that caused all machinery that depends on it (timers, RCU, etc.) to
stop working as well. This issues was however only reproducible if
HRTICK was enabled.

Looking at core dumps it was found that the rbtree of the hrtimer base
used also for the hrtick was corrupted (i.e. next as seen from the base
root and actual leftmost obtained by traversing the tree are different).
Same base is also used for periodic tick hrtimer, which might get "lost"
if the rbtree gets corrupted.

Much alike what described in commit 1f71addd34 ("tick/sched: Do not
mess with an enqueued hrtimer") there is a race window between
hrtimer_set_expires() in hrtick_start and hrtimer_start_expires() in
__hrtick_restart() in which the former might be operating on an already
queued hrtick hrtimer, which might lead to corruption of the base.

Use hrtick_start() (which removes the timer before enqueuing it back) to
ensure hrtick hrtimer reprogramming is entirely guarded by the base
lock, so that no race conditions can occur.

Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210208073554.14629-2-juri.lelli@redhat.com
2021-02-17 14:12:42 +01:00
Dietmar Eggemann de40f33e78 sched/deadline: Reduce rq lock contention in dl_add_task_root_domain()
dl_add_task_root_domain() is called during sched domain rebuild:

  rebuild_sched_domains_locked()
    partition_and_rebuild_sched_domains()
      rebuild_root_domains()
         for all top_cpuset descendants:
           update_tasks_root_domain()
             for all tasks of cpuset:
               dl_add_task_root_domain()

Change it so that only the task pi lock is taken to check if the task
has a SCHED_DEADLINE (DL) policy. In case that p is a DL task take the
rq lock as well to be able to safely de-reference root domain's DL
bandwidth structure.

Most of the tasks will have another policy (namely SCHED_NORMAL) and
can now bail without taking the rq lock.

One thing to note here: Even in case that there aren't any DL user
tasks, a slow frequency switching system with cpufreq gov schedutil has
a DL task (sugov) per frequency domain running which participates in DL
bandwidth management.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Quentin Perret <qperret@google.com>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Acked-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://lkml.kernel.org/r/20210119083542.19856-1-dietmar.eggemann@arm.com
2021-02-17 14:12:42 +01:00
Sven Schnelle b0d6d47896 uprobes: (Re)add missing get_uprobe() in __find_uprobe()
commit c6bc9bd06dff ("rbtree, uprobes: Use rbtree helpers")
accidentally removed the refcount increase. Add it again.

Fixes: c6bc9bd06dff ("rbtree, uprobes: Use rbtree helpers")
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210209150711.36778-1-svens@linux.ibm.com
2021-02-17 14:12:42 +01:00
Sebastian Andrzej Siewior f9d34595ae smp: Process pending softirqs in flush_smp_call_function_from_idle()
send_call_function_single_ipi() may wake an idle CPU without sending an
IPI. The woken up CPU will process the SMP-functions in
flush_smp_call_function_from_idle(). Any raised softirq from within the
SMP-function call will not be processed.
Should the CPU have no tasks assigned, then it will go back to idle with
pending softirqs and the NOHZ will rightfully complain.

Process pending softirqs on return from flush_smp_call_function_queue().

Fixes: b2a02fc43a ("smp: Optimize send_call_function_single_ipi()")
Reported-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210123201027.3262800-2-bigeasy@linutronix.de
2021-02-17 14:12:42 +01:00
Peter Zijlstra ef72661e28 sched: Harden PREEMPT_DYNAMIC
Use the new EXPORT_STATIC_CALL_TRAMP() / static_call_mod() to unexport
the static_call_key for the PREEMPT_DYNAMIC calls such that modules
can no longer update these calls.

Having modules change/hi-jack the preemption calls would be horrible.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-02-17 14:12:42 +01:00
Josh Poimboeuf 73f44fe19d static_call: Allow module use without exposing static_call_key
When exporting static_call_key; with EXPORT_STATIC_CALL*(), the module
can use static_call_update() to change the function called.  This is
not desirable in general.

Not exporting static_call_key however also disallows usage of
static_call(), since objtool needs the key to construct the
static_call_site.

Solve this by allowing objtool to create the static_call_site using
the trampoline address when it builds a module and cannot find the
static_call_key symbol. The module loader will then try and map the
trampole back to a key before it constructs the normal sites list.

Doing this requires a trampoline -> key associsation, so add another
magic section that keeps those.

Originally-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210127231837.ifddpn7rhwdaepiu@treble
2021-02-17 14:12:42 +01:00
Peter Zijlstra e59e10f8ef sched: Add /debug/sched_preempt
Add a debugfs file to muck about with the preempt mode at runtime.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/YAsGiUYf6NyaTplX@hirez.programming.kicks-ass.net
2021-02-17 14:12:42 +01:00
Peter Zijlstra (Intel) 826bfeb37b preempt/dynamic: Support dynamic preempt with preempt= boot option
Support the preempt= boot option and patch the static call sites
accordingly.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210118141223.123667-9-frederic@kernel.org
2021-02-17 14:12:42 +01:00
Peter Zijlstra (Intel) 40607ee97e preempt/dynamic: Provide irqentry_exit_cond_resched() static call
Provide static call to control IRQ preemption (called in CONFIG_PREEMPT)
so that we can override its behaviour when preempt= is overriden.

Since the default behaviour is full preemption, its call is
initialized to provide IRQ preemption when preempt= isn't passed.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210118141223.123667-8-frederic@kernel.org
2021-02-17 14:12:42 +01:00
Peter Zijlstra (Intel) 2c9a98d3bc preempt/dynamic: Provide preempt_schedule[_notrace]() static calls
Provide static calls to control preempt_schedule[_notrace]()
(called in CONFIG_PREEMPT) so that we can override their behaviour when
preempt= is overriden.

Since the default behaviour is full preemption, both their calls are
initialized to the arch provided wrapper, if any.

[fweisbec: only define static calls when PREEMPT_DYNAMIC, make it less
           dependent on x86 with __preempt_schedule_func]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210118141223.123667-7-frederic@kernel.org
2021-02-17 14:12:42 +01:00
Peter Zijlstra (Intel) b965f1ddb4 preempt/dynamic: Provide cond_resched() and might_resched() static calls
Provide static calls to control cond_resched() (called in !CONFIG_PREEMPT)
and might_resched() (called in CONFIG_PREEMPT_VOLUNTARY) to that we
can override their behaviour when preempt= is overriden.

Since the default behaviour is full preemption, both their calls are
ignored when preempt= isn't passed.

  [fweisbec: branch might_resched() directly to __cond_resched(), only
             define static calls when PREEMPT_DYNAMIC]

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210118141223.123667-6-frederic@kernel.org
2021-02-17 14:12:42 +01:00
Michal Hocko 6ef869e064 preempt: Introduce CONFIG_PREEMPT_DYNAMIC
Preemption mode selection is currently hardcoded on Kconfig choices.
Introduce a dedicated option to tune preemption flavour at boot time,

This will be only available on architectures efficiently supporting
static calls in order not to tempt with the feature against additional
overhead that might be prohibitive or undesirable.

CONFIG_PREEMPT_DYNAMIC is automatically selected by CONFIG_PREEMPT if
the architecture provides the necessary support (CONFIG_STATIC_CALL_INLINE,
CONFIG_GENERIC_ENTRY, and provide with __preempt_schedule_function() /
__preempt_schedule_notrace_function()).

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
[peterz: relax requirement to HAVE_STATIC_CALL]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210118141223.123667-5-frederic@kernel.org
2021-02-17 14:12:24 +01:00
Frederic Weisbecker 29fd01944b static_call: Provide DEFINE_STATIC_CALL_RET0()
DECLARE_STATIC_CALL() must pass the original function targeted for a
given static call. But DEFINE_STATIC_CALL() may want to initialize it as
off. In this case we can't pass NULL (for functions without return value)
or __static_call_return0 (for functions returning a value) directly
to DEFINE_STATIC_CALL() as that may trigger a static call redeclaration
with a different function prototype. Type casts neither can work around
that as they don't get along with typeof().

The proper way to do that for functions that don't return a value is
to use DEFINE_STATIC_CALL_NULL(). But functions returning a actual value
don't have an equivalent yet.

Provide DEFINE_STATIC_CALL_RET0() to solve this situation.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210118141223.123667-3-frederic@kernel.org
2021-02-17 14:08:51 +01:00
Peter Zijlstra 3f2a8fc4b1 static_call/x86: Add __static_call_return0()
Provide a stub function that return 0 and wire up the static call site
patching to replace the CALL with a single 5 byte instruction that
clears %RAX, the return value register.

The function can be cast to any function pointer type that has a
single %RAX return (including pointers). Also provide a version that
returns an int for convenience. We are clearing the entire %RAX register
in any case, whether the return value is 32 or 64 bits, since %RAX is
always a scratch register anyway.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210118141223.123667-2-frederic@kernel.org
2021-02-17 14:08:43 +01:00
Peter Zijlstra 880cfed3a0 static_call: Pull some static_call declarations to the type headers
Some static call declarations are going to be needed on low level header
files. Move the necessary material to the dedicated static call types
header to avoid inclusion dependency hell.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210118141223.123667-4-frederic@kernel.org
2021-02-17 14:08:35 +01:00
Dietmar Eggemann c541bb7835 sched/core: Update task_prio() function header
The description of the RT offset and the values for 'normal' tasks needs
update. Moreover there are DL tasks now.
task_prio() has to stay like it is to guarantee compatibility with the
/proc/<pid>/stat priority field:

  # cat /proc/<pid>/stat | awk '{ print $18; }'

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210128131040.296856-4-dietmar.eggemann@arm.com
2021-02-17 14:08:30 +01:00
Dietmar Eggemann 9d061ba6bc sched: Remove USER_PRIO, TASK_USER_PRIO and MAX_USER_PRIO
The only remaining use of MAX_USER_PRIO (and USER_PRIO) is the
SCALE_PRIO() definition in the PowerPC Cell architecture's Synergistic
Processor Unit (SPU) scheduler. TASK_USER_PRIO isn't used anymore.

Commit fe443ef2ac ("[POWERPC] spusched: Dynamic timeslicing for
SCHED_OTHER") copied SCALE_PRIO() from the task scheduler in v2.6.23.

Commit a4ec24b48d ("sched: tidy up SCHED_RR") removed it from the task
scheduler in v2.6.24.

Commit 3ee237dddc ("sched/prio: Add 3 macros of MAX_NICE, MIN_NICE and
NICE_WIDTH in prio.h") introduced NICE_WIDTH much later.

With:

  MAX_USER_PRIO = USER_PRIO(MAX_PRIO)

                = MAX_PRIO - MAX_RT_PRIO

       MAX_PRIO = MAX_RT_PRIO + NICE_WIDTH

  MAX_USER_PRIO = MAX_RT_PRIO + NICE_WIDTH - MAX_RT_PRIO

  MAX_USER_PRIO = NICE_WIDTH

MAX_USER_PRIO can be replaced by NICE_WIDTH to be able to remove all the
{*_}USER_PRIO defines.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210128131040.296856-3-dietmar.eggemann@arm.com
2021-02-17 14:08:17 +01:00
Dietmar Eggemann ae18ad281e sched: Remove MAX_USER_RT_PRIO
Commit d46523ea32 ("[PATCH] fix MAX_USER_RT_PRIO and MAX_RT_PRIO")
was introduced due to a a small time period in which the realtime patch
set was using different values for MAX_USER_RT_PRIO and MAX_RT_PRIO.

This is no longer true, i.e. now MAX_RT_PRIO == MAX_USER_RT_PRIO.

Get rid of MAX_USER_RT_PRIO and make everything use MAX_RT_PRIO
instead.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210128131040.296856-2-dietmar.eggemann@arm.com
2021-02-17 14:08:11 +01:00
Dietmar Eggemann 71e5f6644f sched/topology: Fix sched_domain_topology_level alloc in sched_init_numa()
Commit "sched/topology: Make sched_init_numa() use a set for the
deduplicating sort" allocates 'i + nr_levels (level)' instead of
'i + nr_levels + 1' sched_domain_topology_level.

This led to an Oops (on Arm64 juno with CONFIG_SCHED_DEBUG):

sched_init_domains
  build_sched_domains()
    __free_domain_allocs()
      __sdt_free() {
	...
        for_each_sd_topology(tl)
	  ...
          sd = *per_cpu_ptr(sdd->sd, j); <--
	  ...
      }

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Vincent Guittot <vincent.guittot@linaro.org>
Tested-by: Barry Song <song.bao.hua@hisilicon.com>
Link: https://lkml.kernel.org/r/6000e39e-7d28-c360-9cd6-8798fd22a9bf@arm.com
2021-02-17 14:08:05 +01:00
Peter Zijlstra 798172b137 rbtree, timerqueue: Use rb_add_cached()
Reduce rbtree boiler plate by using the new helpers.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
2021-02-17 14:08:01 +01:00
Peter Zijlstra 5a7987253e rbtree, rtmutex: Use rb_add_cached()
Reduce rbtree boiler plate by using the new helpers.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
2021-02-17 14:07:57 +01:00
Peter Zijlstra a905e84e64 rbtree, uprobes: Use rbtree helpers
Reduce rbtree boilerplate by using the new helpers.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
2021-02-17 14:07:52 +01:00
Peter Zijlstra a3b8986455 rbtree, perf: Use new rbtree helpers
Reduce rbtree boiler plate by using the new helpers.

One noteworthy change is unification of the various (partial) compare
functions. We construct a subtree match by forcing the sub-order to
always match, see __group_cmp().

Due to 'const' we had to touch cgroup_id().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
2021-02-17 14:07:48 +01:00
Peter Zijlstra 8ecca39483 rbtree, sched/deadline: Use rb_add_cached()
Reduce rbtree boiler plate by using the new helpers.

Make rb_add_cached() / rb_erase_cached() return a pointer to the
leftmost node to aid in updating additional state.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
2021-02-17 14:07:44 +01:00