rcu: Update stall-warning documentation
Add documentation of CONFIG_RCU_CPU_STALL_VERBOSE, CONFIG_RCU_CPU_STALL_INFO, and RCU_STALL_DELAY_DELTA. Describe multiple stall-warning messages from a single stall, and the timing of the subsequent messages. Add headings. Remove RCU_SECONDS_TILL_STALL_RECHECK because this value is now computed at runtime from RCU_CPU_STALL_TIMEOUT, so that sysfs changes to the timeout value now directly affect the RCU_SECONDS_TILL_STALL_RECHECK value. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>hifive-unleashed-5.1
parent
c13f3757d0
commit
24cd7fd0ea
|
@ -12,14 +12,38 @@ CONFIG_RCU_CPU_STALL_TIMEOUT
|
||||||
This kernel configuration parameter defines the period of time
|
This kernel configuration parameter defines the period of time
|
||||||
that RCU will wait from the beginning of a grace period until it
|
that RCU will wait from the beginning of a grace period until it
|
||||||
issues an RCU CPU stall warning. This time period is normally
|
issues an RCU CPU stall warning. This time period is normally
|
||||||
ten seconds.
|
sixty seconds.
|
||||||
|
|
||||||
RCU_SECONDS_TILL_STALL_RECHECK
|
This configuration parameter may be changed at runtime via the
|
||||||
|
/sys/module/rcutree/parameters/rcu_cpu_stall_timeout, however
|
||||||
|
this parameter is checked only at the beginning of a cycle.
|
||||||
|
So if you are 30 seconds into a 70-second stall, setting this
|
||||||
|
sysfs parameter to (say) five will shorten the timeout for the
|
||||||
|
-next- stall, or the following warning for the current stall
|
||||||
|
(assuming the stall lasts long enough). It will not affect the
|
||||||
|
timing of the next warning for the current stall.
|
||||||
|
|
||||||
This macro defines the period of time that RCU will wait after
|
Stall-warning messages may be enabled and disabled completely via
|
||||||
issuing a stall warning until it issues another stall warning
|
/sys/module/rcutree/parameters/rcu_cpu_stall_suppress.
|
||||||
for the same stall. This time period is normally set to three
|
|
||||||
times the check interval plus thirty seconds.
|
CONFIG_RCU_CPU_STALL_VERBOSE
|
||||||
|
|
||||||
|
This kernel configuration parameter causes the stall warning to
|
||||||
|
also dump the stacks of any tasks that are blocking the current
|
||||||
|
RCU-preempt grace period.
|
||||||
|
|
||||||
|
RCU_CPU_STALL_INFO
|
||||||
|
|
||||||
|
This kernel configuration parameter causes the stall warning to
|
||||||
|
print out additional per-CPU diagnostic information, including
|
||||||
|
information on scheduling-clock ticks and RCU's idle-CPU tracking.
|
||||||
|
|
||||||
|
RCU_STALL_DELAY_DELTA
|
||||||
|
|
||||||
|
Although the lockdep facility is extremely useful, it does add
|
||||||
|
some overhead. Therefore, under CONFIG_PROVE_RCU, the
|
||||||
|
RCU_STALL_DELAY_DELTA macro allows five extra seconds before
|
||||||
|
giving an RCU CPU stall warning message.
|
||||||
|
|
||||||
RCU_STALL_RAT_DELAY
|
RCU_STALL_RAT_DELAY
|
||||||
|
|
||||||
|
@ -64,6 +88,54 @@ INFO: rcu_bh_state detected stalls on CPUs/tasks: { } (detected by 4, 2502 jiffi
|
||||||
|
|
||||||
This is rare, but does happen from time to time in real life.
|
This is rare, but does happen from time to time in real life.
|
||||||
|
|
||||||
|
If the CONFIG_RCU_CPU_STALL_INFO kernel configuration parameter is set,
|
||||||
|
more information is printed with the stall-warning message, for example:
|
||||||
|
|
||||||
|
INFO: rcu_preempt detected stall on CPU
|
||||||
|
0: (63959 ticks this GP) idle=241/3fffffffffffffff/0
|
||||||
|
(t=65000 jiffies)
|
||||||
|
|
||||||
|
In kernels with CONFIG_RCU_FAST_NO_HZ, even more information is
|
||||||
|
printed:
|
||||||
|
|
||||||
|
INFO: rcu_preempt detected stall on CPU
|
||||||
|
0: (64628 ticks this GP) idle=dd5/3fffffffffffffff/0 drain=0 . timer=-1
|
||||||
|
(t=65000 jiffies)
|
||||||
|
|
||||||
|
The "(64628 ticks this GP)" indicates that this CPU has taken more
|
||||||
|
than 64,000 scheduling-clock interrupts during the current stalled
|
||||||
|
grace period. If the CPU was not yet aware of the current grace
|
||||||
|
period (for example, if it was offline), then this part of the message
|
||||||
|
indicates how many grace periods behind the CPU is.
|
||||||
|
|
||||||
|
The "idle=" portion of the message prints the dyntick-idle state.
|
||||||
|
The hex number before the first "/" is the low-order 12 bits of the
|
||||||
|
dynticks counter, which will have an even-numbered value if the CPU is
|
||||||
|
in dyntick-idle mode and an odd-numbered value otherwise. The hex
|
||||||
|
number between the two "/"s is the value of the nesting, which will
|
||||||
|
be a small positive number if in the idle loop and a very large positive
|
||||||
|
number (as shown above) otherwise.
|
||||||
|
|
||||||
|
For CONFIG_RCU_FAST_NO_HZ kernels, the "drain=0" indicates that the
|
||||||
|
CPU is not in the process of trying to force itself into dyntick-idle
|
||||||
|
state, the "." indicates that the CPU has not given up forcing RCU
|
||||||
|
into dyntick-idle mode (it would be "H" otherwise), and the "timer=-1"
|
||||||
|
indicates that the CPU has not recented forced RCU into dyntick-idle
|
||||||
|
mode (it would otherwise indicate the number of microseconds remaining
|
||||||
|
in this forced state).
|
||||||
|
|
||||||
|
|
||||||
|
Multiple Warnings From One Stall
|
||||||
|
|
||||||
|
If a stall lasts long enough, multiple stall-warning messages will be
|
||||||
|
printed for it. The second and subsequent messages are printed at
|
||||||
|
longer intervals, so that the time between (say) the first and second
|
||||||
|
message will be about three times the interval between the beginning
|
||||||
|
of the stall and the first message.
|
||||||
|
|
||||||
|
|
||||||
|
What Causes RCU CPU Stall Warnings?
|
||||||
|
|
||||||
So your kernel printed an RCU CPU stall warning. The next question is
|
So your kernel printed an RCU CPU stall warning. The next question is
|
||||||
"What caused it?" The following problems can result in RCU CPU stall
|
"What caused it?" The following problems can result in RCU CPU stall
|
||||||
warnings:
|
warnings:
|
||||||
|
@ -128,4 +200,5 @@ is occurring, which will usually be in the function nearest the top of
|
||||||
that portion of the stack which remains the same from trace to trace.
|
that portion of the stack which remains the same from trace to trace.
|
||||||
If you can reliably trigger the stall, ftrace can be quite helpful.
|
If you can reliably trigger the stall, ftrace can be quite helpful.
|
||||||
|
|
||||||
RCU bugs can often be debugged with the help of CONFIG_RCU_TRACE.
|
RCU bugs can often be debugged with the help of CONFIG_RCU_TRACE
|
||||||
|
and with RCU's event tracing.
|
||||||
|
|
Loading…
Reference in New Issue