watchdog: hpwdt: Update Driver Documentation.
Remove references to deprecated features like NMI sourcing and obsoleted module parameters. Add details concerning new module parameter pretimeout and tips to programming it. Signed-off-by: Jerry Hoemann <jerry.hoemann@hpe.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>hifive-unleashed-5.1
parent
e1c7f79ea5
commit
18bd1963ae
|
@ -1,15 +1,12 @@
|
||||||
Last reviewed: 05/20/2016
|
Last reviewed: 08/20/2018
|
||||||
|
|
||||||
HPE iLO NMI Watchdog Driver
|
HPE iLO NMI Watchdog Driver
|
||||||
NMI sourcing for iLO based ProLiant Servers
|
for iLO based ProLiant Servers
|
||||||
Documentation and Driver by
|
|
||||||
Thomas Mingarelli
|
|
||||||
|
|
||||||
The HPE iLO NMI Watchdog driver is a kernel module that provides basic
|
The HPE iLO NMI Watchdog driver is a kernel module that provides basic
|
||||||
watchdog functionality and the added benefit of NMI sourcing. Both the
|
watchdog functionality and handler for the iLO "Generate NMI to System"
|
||||||
watchdog functionality and the NMI sourcing capability need to be enabled
|
virtual button.
|
||||||
by the user. Remember that the two modes are not dependent on one another.
|
|
||||||
A user can have the NMI sourcing without the watchdog timer and vice-versa.
|
|
||||||
All references to iLO in this document imply it also works on iLO2 and all
|
All references to iLO in this document imply it also works on iLO2 and all
|
||||||
subsequent generations.
|
subsequent generations.
|
||||||
|
|
||||||
|
@ -21,12 +18,16 @@ Last reviewed: 05/20/2016
|
||||||
not be updated in a timely fashion and a hardware system reset (also known as
|
not be updated in a timely fashion and a hardware system reset (also known as
|
||||||
an Automatic Server Recovery (ASR)) event will occur.
|
an Automatic Server Recovery (ASR)) event will occur.
|
||||||
|
|
||||||
The hpwdt driver also has three (3) module parameters. They are the following:
|
The hpwdt driver also has the following module parameters:
|
||||||
|
|
||||||
soft_margin - allows the user to set the watchdog timer value.
|
soft_margin - allows the user to set the watchdog timer value.
|
||||||
Default value is 30 seconds.
|
Default value is 30 seconds.
|
||||||
allow_kdump - allows the user to save off a kernel dump image after an NMI.
|
timeout - an alias of soft_margin.
|
||||||
Default value is 1/ON
|
pretimeout - allows the user to set the watchdog pretimeout value.
|
||||||
|
This is the number of seconds before timeout when an
|
||||||
|
NMI is delivered to the system. Setting the value to
|
||||||
|
zero disables the pretimeout NMI.
|
||||||
|
Default value is 9 seconds.
|
||||||
nowayout - basic watchdog parameter that does not allow the timer to
|
nowayout - basic watchdog parameter that does not allow the timer to
|
||||||
be restarted or an impending ASR to be escaped.
|
be restarted or an impending ASR to be escaped.
|
||||||
Default value is set when compiling the kernel. If it is set
|
Default value is set when compiling the kernel. If it is set
|
||||||
|
@ -37,61 +38,29 @@ Last reviewed: 05/20/2016
|
||||||
interface to /dev/watchdog can be found in
|
interface to /dev/watchdog can be found in
|
||||||
Documentation/watchdog/watchdog-api.txt and Documentation/IPMI.txt.
|
Documentation/watchdog/watchdog-api.txt and Documentation/IPMI.txt.
|
||||||
|
|
||||||
The NMI sourcing capability is disabled by default due to the inability to
|
Due to limitations in the iLO hardware, the NMI pretimeout if enabled,
|
||||||
distinguish between "NMI Watchdog Ticks" and "HW generated NMI events" in the
|
can only be set to 9 seconds. Attempts to set pretimeout to other
|
||||||
Linux kernel. What this means is that the hpwdt nmi handler code is called
|
non-zero values will be rounded, possibly to zero. Users should verify
|
||||||
each time the NMI signal fires off. This could amount to several thousands of
|
the pretimeout value after attempting to set pretimeout or timeout.
|
||||||
NMIs in a matter of seconds. If a user sees the Linux kernel's "dazed and
|
|
||||||
confused" message in the logs or if the system gets into a hung state, then
|
|
||||||
the hpwdt driver can be reloaded.
|
|
||||||
|
|
||||||
1. If the kernel has not been booted with nmi_watchdog turned off then
|
Upon receipt of an NMI from the iLO, the hpwdt driver will initiate a
|
||||||
edit and place the nmi_watchdog=0 at the end of the currently booting
|
panic. This is to allow for a crash dump to be collected. It is incumbent
|
||||||
kernel line. Depending on your Linux distribution and platform setup:
|
upon the user to have properly configured the system for kdump.
|
||||||
For non-UEFI systems
|
|
||||||
/boot/grub/grub.conf or
|
|
||||||
/boot/grub/menu.lst
|
|
||||||
For UEFI systems
|
|
||||||
/boot/efi/EFI/distroname/grub.conf or
|
|
||||||
/boot/efi/efi/distroname/elilo.conf
|
|
||||||
2. reboot the sever
|
|
||||||
3. Once the system comes up perform a modprobe -r hpwdt
|
|
||||||
4. modprobe /lib/modules/`uname -r`/kernel/drivers/watchdog/hpwdt.ko
|
|
||||||
|
|
||||||
Now, the hpwdt can successfully receive and source the NMI and provide a log
|
The default Linux kernel behavior upon panic is to print a kernel tombstone
|
||||||
message that details the reason for the NMI (as determined by the HPE BIOS).
|
and loop forever. This is generally not what a watchdog user wants.
|
||||||
|
|
||||||
Below is a list of NMIs the HPE BIOS understands along with the associated
|
For those wishing to learn more please see:
|
||||||
code (reason):
|
Documentation/kdump/kdump.txt
|
||||||
|
Documentation/admin-guide/kernel-parameters.txt (panic=)
|
||||||
|
Your Linux Distribution specific documentation.
|
||||||
|
|
||||||
No source found 00h
|
If the hpwdt does not receive the NMI associated with an expiring timer,
|
||||||
|
the iLO will proceed to reset the system at timeout if the timer hasn't
|
||||||
|
been updated.
|
||||||
|
|
||||||
Uncorrectable Memory Error 01h
|
--
|
||||||
|
|
||||||
ASR NMI 1Bh
|
The HPE iLO NMI Watchdog Driver and documentation were originally developed
|
||||||
|
by Tom Mingarelli.
|
||||||
|
|
||||||
PCI Parity Error 20h
|
|
||||||
|
|
||||||
NMI Button Press 27h
|
|
||||||
|
|
||||||
SB_BUS_NMI 28h
|
|
||||||
|
|
||||||
ILO Doorbell NMI 29h
|
|
||||||
|
|
||||||
ILO IOP NMI 2Ah
|
|
||||||
|
|
||||||
ILO Watchdog NMI 2Bh
|
|
||||||
|
|
||||||
Proc Throt NMI 2Ch
|
|
||||||
|
|
||||||
Front Side Bus NMI 2Dh
|
|
||||||
|
|
||||||
PCI Express Error 2Fh
|
|
||||||
|
|
||||||
DMA controller NMI 30h
|
|
||||||
|
|
||||||
Hypertransport/CSI Error 31h
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
-- Tom Mingarelli
|
|
||||||
|
|
Loading…
Reference in New Issue