1
0
Fork 0

Merge remote-tracking branch 'drm/drm-next' into drm-misc-next

drm-next is forwarded to v4.20-rc1, and we need this to make
a patch series apply.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
hifive-unleashed-5.1
Maarten Lankhorst 2018-11-13 10:58:49 +01:00
commit 0ea0397a3a
10025 changed files with 504995 additions and 228244 deletions

View File

@ -323,7 +323,6 @@ ForEachMacros:
- 'protocol_for_each_card'
- 'protocol_for_each_dev'
- 'queue_for_each_hw_ctx'
- 'radix_tree_for_each_contig'
- 'radix_tree_for_each_slot'
- 'radix_tree_for_each_tagged'
- 'rbtree_postorder_for_each_entry_safe'

View File

@ -119,6 +119,13 @@ Mark Brown <broonie@sirena.org.uk>
Mark Yao <markyao0591@gmail.com> <mark.yao@rock-chips.com>
Martin Kepplinger <martink@posteo.de> <martin.kepplinger@theobroma-systems.com>
Martin Kepplinger <martink@posteo.de> <martin.kepplinger@ginzinger.com>
Matthew Wilcox <willy@infradead.org> <matthew.r.wilcox@intel.com>
Matthew Wilcox <willy@infradead.org> <matthew@wil.cx>
Matthew Wilcox <willy@infradead.org> <mawilcox@linuxonhyperv.com>
Matthew Wilcox <willy@infradead.org> <mawilcox@microsoft.com>
Matthew Wilcox <willy@infradead.org> <willy@debian.org>
Matthew Wilcox <willy@infradead.org> <willy@linux.intel.com>
Matthew Wilcox <willy@infradead.org> <willy@parisc-linux.org>
Matthieu CASTET <castet.matthieu@free.fr>
Mauro Carvalho Chehab <mchehab@kernel.org> <mchehab@brturbo.com.br>
Mauro Carvalho Chehab <mchehab@kernel.org> <maurochehab@gmail.com>
@ -153,6 +160,11 @@ Peter Oruba <peter.oruba@amd.com>
Pratyush Anand <pratyush.anand@gmail.com> <pratyush.anand@st.com>
Praveen BP <praveenbp@ti.com>
Qais Yousef <qsyousef@gmail.com> <qais.yousef@imgtec.com>
Oleksij Rempel <linux@rempel-privat.de> <bug-track@fisher-privat.net>
Oleksij Rempel <linux@rempel-privat.de> <external.Oleksij.Rempel@de.bosch.com>
Oleksij Rempel <linux@rempel-privat.de> <fixed-term.Oleksij.Rempel@de.bosch.com>
Oleksij Rempel <linux@rempel-privat.de> <o.rempel@pengutronix.de>
Oleksij Rempel <linux@rempel-privat.de> <ore@pengutronix.de>
Rajesh Shah <rajesh.shah@intel.com>
Ralf Baechle <ralf@linux-mips.org>
Ralf Wildenhues <Ralf.Wildenhues@gmx.de>

View File

@ -1,428 +0,0 @@
This is a brief list of all the files in ./linux/Documentation and what
they contain. If you add a documentation file, please list it here in
alphabetical order as well, or risk being hunted down like a rabid dog.
Please keep the descriptions small enough to fit on one line.
Thanks -- Paul G.
Following translations are available on the WWW:
- Japanese, maintained by the JF Project (jf@listserv.linux.or.jp), at
http://linuxjf.sourceforge.jp/
00-INDEX
- this file.
ABI/
- info on kernel <-> userspace ABI and relative interface stability.
CodingStyle
- nothing here, just a pointer to process/coding-style.rst.
DMA-API.txt
- DMA API, pci_ API & extensions for non-consistent memory machines.
DMA-API-HOWTO.txt
- Dynamic DMA mapping Guide
DMA-ISA-LPC.txt
- How to do DMA with ISA (and LPC) devices.
DMA-attributes.txt
- listing of the various possible attributes a DMA region can have
EDID/
- directory with info on customizing EDID for broken gfx/displays.
IPMI.txt
- info on Linux Intelligent Platform Management Interface (IPMI) Driver.
IRQ-affinity.txt
- how to select which CPU(s) handle which interrupt events on SMP.
IRQ-domain.txt
- info on interrupt numbering and setting up IRQ domains.
IRQ.txt
- description of what an IRQ is.
Intel-IOMMU.txt
- basic info on the Intel IOMMU virtualization support.
Makefile
- It's not of interest for those who aren't touching the build system.
PCI/
- info related to PCI drivers.
RCU/
- directory with info on RCU (read-copy update).
SAK.txt
- info on Secure Attention Keys.
SM501.txt
- Silicon Motion SM501 multimedia companion chip
SubmittingPatches
- nothing here, just a pointer to process/coding-style.rst.
accounting/
- documentation on accounting and taskstats.
acpi/
- info on ACPI-specific hooks in the kernel.
admin-guide/
- info related to Linux users and system admins.
aoe/
- description of AoE (ATA over Ethernet) along with config examples.
arm/
- directory with info about Linux on the ARM architecture.
arm64/
- directory with info about Linux on the 64 bit ARM architecture.
auxdisplay/
- misc. LCD driver documentation (cfag12864b, ks0108).
backlight/
- directory with info on controlling backlights in flat panel displays
block/
- info on the Block I/O (BIO) layer.
blockdev/
- info on block devices & drivers
bt8xxgpio.txt
- info on how to modify a bt8xx video card for GPIO usage.
btmrvl.txt
- info on Marvell Bluetooth driver usage.
bus-devices/
- directory with info on TI GPMC (General Purpose Memory Controller)
bus-virt-phys-mapping.txt
- how to access I/O mapped memory from within device drivers.
cdrom/
- directory with information on the CD-ROM drivers that Linux has.
cgroup-v1/
- cgroups v1 features, including cpusets and memory controller.
cma/
- Continuous Memory Area (CMA) debugfs interface.
conf.py
- It's not of interest for those who aren't touching the build system.
connector/
- docs on the netlink based userspace<->kernel space communication mod.
console/
- documentation on Linux console drivers.
core-api/
- documentation on kernel core components.
cpu-freq/
- info on CPU frequency and voltage scaling.
cpu-hotplug.txt
- document describing CPU hotplug support in the Linux kernel.
cpu-load.txt
- document describing how CPU load statistics are collected.
cpuidle/
- info on CPU_IDLE, CPU idle state management subsystem.
cputopology.txt
- documentation on how CPU topology info is exported via sysfs.
crc32.txt
- brief tutorial on CRC computation
crypto/
- directory with info on the Crypto API.
dcdbas.txt
- information on the Dell Systems Management Base Driver.
debugging-modules.txt
- some notes on debugging modules after Linux 2.6.3.
debugging-via-ohci1394.txt
- how to use firewire like a hardware debugger memory reader.
dell_rbu.txt
- document demonstrating the use of the Dell Remote BIOS Update driver.
dev-tools/
- directory with info on development tools for the kernel.
device-mapper/
- directory with info on Device Mapper.
dmaengine/
- the DMA engine and controller API guides.
devicetree/
- directory with info on device tree files used by OF/PowerPC/ARM
digsig.txt
-info on the Digital Signature Verification API
dma-buf-sharing.txt
- the DMA Buffer Sharing API Guide
docutils.conf
- nothing here. Just a configuration file for docutils.
dontdiff
- file containing a list of files that should never be diff'ed.
driver-api/
- the Linux driver implementer's API guide.
driver-model/
- directory with info about Linux driver model.
early-userspace/
- info about initramfs, klibc, and userspace early during boot.
efi-stub.txt
- How to use the EFI boot stub to bypass GRUB or elilo on EFI systems.
eisa.txt
- info on EISA bus support.
extcon/
- directory with porting guide for Android kernel switch driver.
isa.txt
- info on EISA bus support.
fault-injection/
- dir with docs about the fault injection capabilities infrastructure.
fb/
- directory with info on the frame buffer graphics abstraction layer.
features/
- status of feature implementation on different architectures.
filesystems/
- info on the vfs and the various filesystems that Linux supports.
firmware_class/
- request_firmware() hotplug interface info.
flexible-arrays.txt
- how to make use of flexible sized arrays in linux
fmc/
- information about the FMC bus abstraction
fpga/
- FPGA Manager Core.
futex-requeue-pi.txt
- info on requeueing of tasks from a non-PI futex to a PI futex
gcc-plugins.txt
- GCC plugin infrastructure.
gpio/
- gpio related documentation
gpu/
- directory with information on GPU driver developer's guide.
hid/
- directory with information on human interface devices
highuid.txt
- notes on the change from 16 bit to 32 bit user/group IDs.
hwspinlock.txt
- hardware spinlock provides hardware assistance for synchronization
timers/
- info on the timer related topics
hw_random.txt
- info on Linux support for random number generator in i8xx chipsets.
hwmon/
- directory with docs on various hardware monitoring drivers.
i2c/
- directory with info about the I2C bus/protocol (2 wire, kHz speed).
x86/i386/
- directory with info about Linux on Intel 32 bit architecture.
ia64/
- directory with info about Linux on Intel 64 bit architecture.
ide/
- Information regarding the Enhanced IDE drive.
iio/
- info on industrial IIO configfs support.
index.rst
- main index for the documentation at ReST format.
infiniband/
- directory with documents concerning Linux InfiniBand support.
input/
- info on Linux input device support.
intel_txt.txt
- info on intel Trusted Execution Technology (intel TXT).
io-mapping.txt
- description of io_mapping functions in linux/io-mapping.h
io_ordering.txt
- info on ordering I/O writes to memory-mapped addresses.
ioctl/
- directory with documents describing various IOCTL calls.
iostats.txt
- info on I/O statistics Linux kernel provides.
irqflags-tracing.txt
- how to use the irq-flags tracing feature.
isapnp.txt
- info on Linux ISA Plug & Play support.
isdn/
- directory with info on the Linux ISDN support, and supported cards.
kbuild/
- directory with info about the kernel build process.
kdump/
- directory with mini HowTo on getting the crash dump code to work.
doc-guide/
- how to write and format reStructuredText kernel documentation
kernel-per-CPU-kthreads.txt
- List of all per-CPU kthreads and how they introduce jitter.
kobject.txt
- info of the kobject infrastructure of the Linux kernel.
kprobes.txt
- documents the kernel probes debugging feature.
kref.txt
- docs on adding reference counters (krefs) to kernel objects.
laptops/
- directory with laptop related info and laptop driver documentation.
ldm.txt
- a brief description of LDM (Windows Dynamic Disks).
leds/
- directory with info about LED handling under Linux.
livepatch/
- info on kernel live patching.
locking/
- directory with info about kernel locking primitives
lockup-watchdogs.txt
- info on soft and hard lockup detectors (aka nmi_watchdog).
logo.gif
- full colour GIF image of Linux logo (penguin - Tux).
logo.txt
- info on creator of above logo & site to get additional images from.
lsm.txt
- Linux Security Modules: General Security Hooks for Linux
lzo.txt
- kernel LZO decompressor input formats
m68k/
- directory with info about Linux on Motorola 68k architecture.
mailbox.txt
- How to write drivers for the common mailbox framework (IPC).
md/
- directory with info about Linux Software RAID
media/
- info on media drivers: uAPI, kAPI and driver documentation.
memory-barriers.txt
- info on Linux kernel memory barriers.
memory-devices/
- directory with info on parts like the Texas Instruments EMIF driver
memory-hotplug.txt
- Hotpluggable memory support, how to use and current status.
men-chameleon-bus.txt
- info on MEN chameleon bus.
mic/
- Intel Many Integrated Core (MIC) architecture device driver.
mips/
- directory with info about Linux on MIPS architecture.
misc-devices/
- directory with info about devices using the misc dev subsystem
mmc/
- directory with info about the MMC subsystem
mtd/
- directory with info about memory technology devices (flash)
namespaces/
- directory with various information about namespaces
netlabel/
- directory with information on the NetLabel subsystem.
networking/
- directory with info on various aspects of networking with Linux.
nfc/
- directory relating info about Near Field Communications support.
nios2/
- Linux on the Nios II architecture.
nommu-mmap.txt
- documentation about no-mmu memory mapping support.
numastat.txt
- info on how to read Numa policy hit/miss statistics in sysfs.
ntb.txt
- info on Non-Transparent Bridge (NTB) drivers.
nvdimm/
- info on non-volatile devices.
nvmem/
- info on non volatile memory framework.
output/
- default directory where html/LaTeX/pdf files will be written.
padata.txt
- An introduction to the "padata" parallel execution API
parisc/
- directory with info on using Linux on PA-RISC architecture.
parport-lowlevel.txt
- description and usage of the low level parallel port functions.
pcmcia/
- info on the Linux PCMCIA driver.
percpu-rw-semaphore.txt
- RCU based read-write semaphore optimized for locking for reading
perf/
- info about the APM X-Gene SoC Performance Monitoring Unit (PMU).
phy/
- ino on Samsung USB 2.0 PHY adaptation layer.
phy.txt
- Description of the generic PHY framework.
pi-futex.txt
- documentation on lightweight priority inheritance futexes.
pinctrl.txt
- info on pinctrl subsystem and the PINMUX/PINCONF and drivers
platform/
- List of supported hardware by compal and Dell laptop.
pnp.txt
- Linux Plug and Play documentation.
power/
- directory with info on Linux PCI power management.
powerpc/
- directory with info on using Linux with the PowerPC.
prctl/
- directory with info on the priveledge control subsystem
preempt-locking.txt
- info on locking under a preemptive kernel.
process/
- how to work with the mainline kernel development process.
pps/
- directory with information on the pulse-per-second support
pti/
- directory with info on Intel MID PTI.
ptp/
- directory with info on support for IEEE 1588 PTP clocks in Linux.
pwm.txt
- info on the pulse width modulation driver subsystem
rapidio/
- directory with info on RapidIO packet-based fabric interconnect
rbtree.txt
- info on what red-black trees are and what they are for.
remoteproc.txt
- info on how to handle remote processor (e.g. AMP) offloads/usage.
rfkill.txt
- info on the radio frequency kill switch subsystem/support.
robust-futex-ABI.txt
- documentation of the robust futex ABI.
robust-futexes.txt
- a description of what robust futexes are.
rpmsg.txt
- info on the Remote Processor Messaging (rpmsg) Framework
rtc.txt
- notes on how to use the Real Time Clock (aka CMOS clock) driver.
s390/
- directory with info on using Linux on the IBM S390.
scheduler/
- directory with info on the scheduler.
scsi/
- directory with info on Linux scsi support.
security/
- directory that contains security-related info
serial/
- directory with info on the low level serial API.
sgi-ioc4.txt
- description of the SGI IOC4 PCI (multi function) device.
sh/
- directory with info on porting Linux to a new architecture.
smsc_ece1099.txt
-info on the smsc Keyboard Scan Expansion/GPIO Expansion device.
sound/
- directory with info on sound card support.
spi/
- overview of Linux kernel Serial Peripheral Interface (SPI) support.
sphinx/
- no documentation here, just files required by Sphinx toolchain.
sphinx-static/
- no documentation here, just files required by Sphinx toolchain.
static-keys.txt
- info on how static keys allow debug code in hotpaths via patching
svga.txt
- short guide on selecting video modes at boot via VGA BIOS.
sync_file.txt
- Sync file API guide.
sysctl/
- directory with info on the /proc/sys/* files.
target/
- directory with info on generating TCM v4 fabric .ko modules
tee.txt
- info on the TEE subsystem and drivers
this_cpu_ops.txt
- List rationale behind and the way to use this_cpu operations.
thermal/
- directory with information on managing thermal issues (CPU/temp)
trace/
- directory with info on tracing technologies within linux
translations/
- translations of this document from English to another language
unaligned-memory-access.txt
- info on how to avoid arch breaking unaligned memory access in code.
unshare.txt
- description of the Linux unshare system call.
usb/
- directory with info regarding the Universal Serial Bus.
vfio.txt
- info on Virtual Function I/O used in guest/hypervisor instances.
video-output.txt
- sysfs class driver interface to enable/disable a video output device.
virtual/
- directory with information on the various linux virtualizations.
vm/
- directory with info on the Linux vm code.
w1/
- directory with documents regarding the 1-wire (w1) subsystem.
watchdog/
- how to auto-reboot Linux if it has "fallen and can't get up". ;-)
wimax/
- directory with info about Intel Wireless Wimax Connections
core-api/workqueue.rst
- information on the Concurrency Managed Workqueue implementation
x86/x86_64/
- directory with info on Linux support for AMD x86-64 (Hammer) machines.
xillybus.txt
- Overview and basic ui of xillybus driver
xtensa/
- directory with documents relating to arch/xtensa port/implementation
xz.txt
- how to make use of the XZ data compression within linux kernel
zorro.txt
- info on writing drivers for Zorro bus devices found on Amigas.

View File

@ -25,38 +25,3 @@ Description:
4.2.2.
The files are read only.
What: /sys/bus/usb/drivers/usbtmc/*/TermChar
Date: August 2008
Contact: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Description:
This file is the TermChar value to be sent to the USB TMC
device as described by the document, "Universal Serial Bus Test
and Measurement Class Specification
(USBTMC) Revision 1.0" as published by the USB-IF.
Note that the TermCharEnabled file determines if this value is
sent to the device or not.
What: /sys/bus/usb/drivers/usbtmc/*/TermCharEnabled
Date: August 2008
Contact: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Description:
This file determines if the TermChar is to be sent to the
device on every transaction or not. For more details about
this, please see the document, "Universal Serial Bus Test and
Measurement Class Specification (USBTMC) Revision 1.0" as
published by the USB-IF.
What: /sys/bus/usb/drivers/usbtmc/*/auto_abort
Date: August 2008
Contact: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Description:
This file determines if the transaction of the USB TMC
device is to be automatically aborted if there is any error.
For more details about this, please see the document,
"Universal Serial Bus Test and Measurement Class Specification
(USBTMC) Revision 1.0" as published by the USB-IF.

View File

@ -0,0 +1,41 @@
What: /config/stp-policy/<device>:p_sys-t.<policy>/<node>/uuid
Date: June 2018
KernelVersion: 4.19
Description:
UUID source identifier string, RW.
Default value is randomly generated at the mkdir <node> time.
Data coming from trace sources that use this <node> will be
tagged with this UUID in the MIPI SyS-T packet stream, to
allow the decoder to discern between different sources
within the same master/channel range, and identify the
higher level decoders that may be needed for each source.
What: /config/stp-policy/<device>:p_sys-t.<policy>/<node>/do_len
Date: June 2018
KernelVersion: 4.19
Description:
Include payload length in the MIPI SyS-T header, boolean.
If enabled, the SyS-T protocol encoder will include payload
length in each packet's metadata. This is normally redundant
if the underlying transport protocol supports marking message
boundaries (which STP does), so this is off by default.
What: /config/stp-policy/<device>:p_sys-t.<policy>/<node>/ts_interval
Date: June 2018
KernelVersion: 4.19
Description:
Time interval in milliseconds. Include a timestamp in the
MIPI SyS-T packet metadata, if this many milliseconds have
passed since the previous packet from this source. Zero is
the default and stands for "never send the timestamp".
What: /config/stp-policy/<device>:p_sys-t.<policy>/<node>/clocksync_interval
Date: June 2018
KernelVersion: 4.19
Description:
Time interval in milliseconds. Send a CLOCKSYNC packet if
this many milliseconds have passed since the previous
CLOCKSYNC packet from this source. Zero is the default and
stands for "never send the CLOCKSYNC". It makes sense to
use this option with sources that generate constant and/or
periodic data, like stm_heartbeat.

View File

@ -12,6 +12,10 @@ Date: Dec 2014
KernelVersion: 4.0
Description: Control descriptors
All attributes read only:
bInterfaceNumber - USB interface number for this
streaming interface
What: /config/usb-gadget/gadget/functions/uvc.name/control/class
Date: Dec 2014
KernelVersion: 4.0
@ -109,6 +113,10 @@ Date: Dec 2014
KernelVersion: 4.0
Description: Streaming descriptors
All attributes read only:
bInterfaceNumber - USB interface number for this
streaming interface
What: /config/usb-gadget/gadget/functions/uvc.name/streaming/class
Date: Dec 2014
KernelVersion: 4.0
@ -160,6 +168,10 @@ Description: Specific MJPEG format descriptors
All attributes read only,
except bmaControls and bDefaultFrameIndex:
bFormatIndex - unique id for this format descriptor;
only defined after parent header is
linked into the streaming class;
read-only
bmaControls - this format's data for bmaControls in
the streaming header
bmInterfaceFlags - specifies interlace information,
@ -177,6 +189,10 @@ Date: Dec 2014
KernelVersion: 4.0
Description: Specific MJPEG frame descriptors
bFrameIndex - unique id for this framedescriptor;
only defined after parent format is
linked into the streaming header;
read-only
dwFrameInterval - indicates how frame interval can be
programmed; a number of values
separated by newline can be specified
@ -204,6 +220,10 @@ Date: Dec 2014
KernelVersion: 4.0
Description: Specific uncompressed format descriptors
bFormatIndex - unique id for this format descriptor;
only defined after parent header is
linked into the streaming class;
read-only
bmaControls - this format's data for bmaControls in
the streaming header
bmInterfaceFlags - specifies interlace information,
@ -224,6 +244,10 @@ Date: Dec 2014
KernelVersion: 4.0
Description: Specific uncompressed frame descriptors
bFrameIndex - unique id for this framedescriptor;
only defined after parent format is
linked into the streaming header;
read-only
dwFrameInterval - indicates how frame interval can be
programmed; a number of values
separated by newline can be specified

View File

@ -199,7 +199,7 @@ Description:
What: /sys/bus/iio/devices/iio:deviceX/in_positionrelative_x_raw
What: /sys/bus/iio/devices/iio:deviceX/in_positionrelative_y_raw
KernelVersion: 4.18
KernelVersion: 4.19
Contact: linux-iio@vger.kernel.org
Description:
Relative position in direction x or y on a pad (may be

View File

@ -323,3 +323,27 @@ Description:
This is similar to /sys/bus/pci/drivers_autoprobe, but
affects only the VFs associated with a specific PF.
What: /sys/bus/pci/devices/.../p2pmem/size
Date: November 2017
Contact: Logan Gunthorpe <logang@deltatee.com>
Description:
If the device has any Peer-to-Peer memory registered, this
file contains the total amount of memory that the device
provides (in decimal).
What: /sys/bus/pci/devices/.../p2pmem/available
Date: November 2017
Contact: Logan Gunthorpe <logang@deltatee.com>
Description:
If the device has any Peer-to-Peer memory registered, this
file contains the amount of memory that has not been
allocated (in decimal).
What: /sys/bus/pci/devices/.../p2pmem/published
Date: November 2017
Contact: Logan Gunthorpe <logang@deltatee.com>
Description:
If the device has any Peer-to-Peer memory registered, this
file contains a '1' if the memory has been published for
use outside the driver that owns the device.

View File

@ -189,6 +189,16 @@ Description:
The file will read "hotplug", "wired" and "not used" if the
information is available, and "unknown" otherwise.
What: /sys/bus/usb/devices/.../(hub interface)/portX/location
Date: October 2018
Contact: Bjørn Mork <bjorn@mork.no>
Description:
Some platforms provide usb port physical location through
firmware. This is used by the kernel to pair up logical ports
mapping to the same physical connector. The attribute exposes the
raw location value as a hex integer.
What: /sys/bus/usb/devices/.../(hub interface)/portX/quirks
Date: May 2018
Contact: Nicolas Boichat <drinkcat@chromium.org>
@ -219,7 +229,14 @@ Description:
ports and report them to the kernel. This attribute is to expose
the number of over-current situation occurred on a specific port
to user space. This file will contain an unsigned 32 bit value
which wraps to 0 after its maximum is reached.
which wraps to 0 after its maximum is reached. This file supports
poll() for monitoring changes to this value in user space.
Any time this value changes the corresponding hub device will send a
udev event with the following attributes:
OVER_CURRENT_PORT=/sys/bus/usb/devices/.../(hub interface)/portX
OVER_CURRENT_COUNT=[current value of this sysfs attribute]
What: /sys/bus/usb/devices/.../(hub interface)/portX/usb3_lpm_permit
Date: November 2015

View File

@ -0,0 +1,21 @@
What: /sys/bus/vmbus/devices/.../driver_override
Date: August 2019
Contact: Stephen Hemminger <sthemmin@microsoft.com>
Description:
This file allows the driver for a device to be specified which
will override standard static and dynamic ID matching. When
specified, only a driver with a name matching the value written
to driver_override will have an opportunity to bind to the
device. The override is specified by writing a string to the
driver_override file (echo uio_hv_generic > driver_override) and
may be cleared with an empty string (echo > driver_override).
This returns the device to standard matching rules binding.
Writing to driver_override does not automatically unbind the
device from its current driver or make any attempt to
automatically load the specified driver. If no driver with a
matching name is currently loaded in the kernel, the device
will not bind to any driver. This also allows devices to
opt-out of driver binding using a driver_override name such as
"none". Only a single driver may be specified in the override,
there is no support for parsing delimiters.

View File

@ -1,27 +0,0 @@
sysfs interface for the S6E63M0 AMOLED LCD panel driver
-------------------------------------------------------
What: /sys/class/lcd/<lcd>/gamma_mode
Date: May, 2010
KernelVersion: v2.6.35
Contact: dri-devel@lists.freedesktop.org
Description:
(RW) Read or write the gamma mode. Following three modes are
supported:
0 - gamma value 2.2,
1 - gamma value 1.9 and
2 - gamma value 1.7.
What: /sys/class/lcd/<lcd>/gamma_table
Date: May, 2010
KernelVersion: v2.6.35
Contact: dri-devel@lists.freedesktop.org
Description:
(RO) Displays the size of the gamma table i.e. the number of
gamma modes available.
This is a backlight lcd driver. These interfaces are an extension to the API
documented in Documentation/ABI/testing/sysfs-class-lcd and in
Documentation/ABI/stable/sysfs-class-backlight (under
/sys/class/backlight/<backlight>/).

View File

@ -0,0 +1,22 @@
What: /sys/class/leds/<led>/hw_pattern
Date: September 2018
KernelVersion: 4.20
Description:
Specify a hardware pattern for the SC27XX LED. For the SC27XX
LED controller, it only supports 4 stages to make a single
hardware pattern, which is used to configure the rise time,
high time, fall time and low time for the breathing mode.
For the breathing mode, the SC27XX LED only expects one brightness
for the high stage. To be compatible with the hardware pattern
format, we should set brightness as 0 for rise stage, fall
stage and low stage.
Min stage duration: 125 ms
Max stage duration: 31875 ms
Since the stage duration step is 125 ms, the duration should be
a multiplier of 125, like 125ms, 250ms, 375ms, 500ms ... 31875ms.
Thus the format of the hardware pattern values should be:
"0 rise_duration brightness high_duration 0 fall_duration 0 low_duration".

View File

@ -0,0 +1,82 @@
What: /sys/class/leds/<led>/pattern
Date: September 2018
KernelVersion: 4.20
Description:
Specify a software pattern for the LED, that supports altering
the brightness for the specified duration with one software
timer. It can do gradual dimming and step change of brightness.
The pattern is given by a series of tuples, of brightness and
duration (ms). The LED is expected to traverse the series and
each brightness value for the specified duration. Duration of
0 means brightness should immediately change to new value, and
writing malformed pattern deactivates any active one.
1. For gradual dimming, the dimming interval now is set as 50
milliseconds. So the tuple with duration less than dimming
interval (50ms) is treated as a step change of brightness,
i.e. the subsequent brightness will be applied without adding
intervening dimming intervals.
The gradual dimming format of the software pattern values should be:
"brightness_1 duration_1 brightness_2 duration_2 brightness_3
duration_3 ...". For example:
echo 0 1000 255 2000 > pattern
It will make the LED go gradually from zero-intensity to max (255)
intensity in 1000 milliseconds, then back to zero intensity in 2000
milliseconds:
LED brightness
^
255-| / \ / \ /
| / \ / \ /
| / \ / \ /
| / \ / \ /
0-| / \/ \/
+---0----1----2----3----4----5----6------------> time (s)
2. To make the LED go instantly from one brigntess value to another,
we should use use zero-time lengths (the brightness must be same as
the previous tuple's). So the format should be:
"brightness_1 duration_1 brightness_1 0 brightness_2 duration_2
brightness_2 0 ...". For example:
echo 0 1000 0 0 255 2000 255 0 > pattern
It will make the LED stay off for one second, then stay at max brightness
for two seconds:
LED brightness
^
255-| +---------+ +---------+
| | | | |
| | | | |
| | | | |
0-| -----+ +----+ +----
+---0----1----2----3----4----5----6------------> time (s)
What: /sys/class/leds/<led>/hw_pattern
Date: September 2018
KernelVersion: 4.20
Description:
Specify a hardware pattern for the LED, for LED hardware that
supports autonomously controlling brightness over time, according
to some preprogrammed hardware patterns. It deactivates any active
software pattern.
Since different LED hardware can have different semantics of
hardware patterns, each driver is expected to provide its own
description for the hardware patterns in their ABI documentation
file.
What: /sys/class/leds/<led>/repeat
Date: September 2018
KernelVersion: 4.20
Description:
Specify a pattern repeat number. -1 means repeat indefinitely,
other negative numbers and number 0 are invalid.
This file will always return the originally written repeat
number.

View File

@ -91,6 +91,24 @@ Description:
stacked (e.g: VLAN interfaces) but still have the same MAC
address as their parent device.
What: /sys/class/net/<iface>/dev_port
Date: February 2014
KernelVersion: 3.15
Contact: netdev@vger.kernel.org
Description:
Indicates the port number of this network device, formatted
as a decimal value. Some NICs have multiple independent ports
on the same PCI bus, device and function. This attribute allows
userspace to distinguish the respective interfaces.
Note: some device drivers started to use 'dev_id' for this
purpose since long before 3.15 and have not adopted the new
attribute ever since. To query the port number, some tools look
exclusively at 'dev_port', while others only consult 'dev_id'.
If a network device has multiple client adapter ports as
described in the previous paragraph and does not set this
attribute to its port number, it's a kernel bug.
What: /sys/class/net/<iface>/dormant
Date: March 2006
KernelVersion: 2.6.17
@ -117,7 +135,7 @@ Description:
full: full duplex
Note: This attribute is only valid for interfaces that implement
the ethtool get_settings method (mostly Ethernet).
the ethtool get_link_ksettings method (mostly Ethernet).
What: /sys/class/net/<iface>/flags
Date: April 2005
@ -224,7 +242,7 @@ Description:
an integer representing the link speed in Mbits/sec.
Note: this attribute is only valid for interfaces that implement
the ethtool get_settings method (mostly Ethernet ).
the ethtool get_link_ksettings method (mostly Ethernet).
What: /sys/class/net/<iface>/tx_queue_len
Date: April 2005

View File

@ -0,0 +1,7 @@
What: /sys/class/net/<iface>/tagging
Date: August 2018
KernelVersion: 4.20
Contact: netdev@vger.kernel.org
Description:
String indicating the type of tagging protocol used by the
DSA slave network device.

View File

@ -121,7 +121,22 @@ What: /sys/fs/f2fs/<disk>/idle_interval
Date: January 2016
Contact: "Jaegeuk Kim" <jaegeuk@kernel.org>
Description:
Controls the idle timing.
Controls the idle timing for all paths other than
discard and gc path.
What: /sys/fs/f2fs/<disk>/discard_idle_interval
Date: September 2018
Contact: "Chao Yu" <yuchao0@huawei.com>
Contact: "Sahitya Tummala" <stummala@codeaurora.org>
Description:
Controls the idle timing for discard path.
What: /sys/fs/f2fs/<disk>/gc_idle_interval
Date: September 2018
Contact: "Chao Yu" <yuchao0@huawei.com>
Contact: "Sahitya Tummala" <stummala@codeaurora.org>
Description:
Controls the idle timing for gc path.
What: /sys/fs/f2fs/<disk>/iostat_enable
Date: August 2017

View File

@ -0,0 +1,35 @@
What: /sys/devices/platform/lg-laptop/reader_mode
Date: October 2018
KernelVersion: 4.20
Contact: "Matan Ziv-Av <matan@svgalib.org>
Description:
Control reader mode. 1 means on, 0 means off.
What: /sys/devices/platform/lg-laptop/fn_lock
Date: October 2018
KernelVersion: 4.20
Contact: "Matan Ziv-Av <matan@svgalib.org>
Description:
Control FN lock mode. 1 means on, 0 means off.
What: /sys/devices/platform/lg-laptop/battery_care_limit
Date: October 2018
KernelVersion: 4.20
Contact: "Matan Ziv-Av <matan@svgalib.org>
Description:
Maximal battery charge level. Accepted values are 80 or 100.
What: /sys/devices/platform/lg-laptop/fan_mode
Date: October 2018
KernelVersion: 4.20
Contact: "Matan Ziv-Av <matan@svgalib.org>
Description:
Control fan mode. 1 for performance mode, 0 for silent mode.
What: /sys/devices/platform/lg-laptop/usb_charge
Date: October 2018
KernelVersion: 4.20
Contact: "Matan Ziv-Av <matan@svgalib.org>
Description:
Control USB port charging when device is turned off.
1 means on, 0 means off.

View File

@ -99,7 +99,7 @@ Description:
this file, the suspend image will be as small as possible.
Reading from this file will display the current image size
limit, which is set to 500 MB by default.
limit, which is set to around 2/5 of available RAM by default.
What: /sys/power/pm_trace
Date: August 2006

View File

@ -1,26 +0,0 @@
00-INDEX
- this file
acpi-info.txt
- info on how PCI host bridges are represented in ACPI
MSI-HOWTO.txt
- the Message Signaled Interrupts (MSI) Driver Guide HOWTO and FAQ.
PCIEBUS-HOWTO.txt
- a guide describing the PCI Express Port Bus driver
pci-error-recovery.txt
- info on PCI error recovery
pci-iov-howto.txt
- the PCI Express I/O Virtualization HOWTO
pci.txt
- info on the PCI subsystem for device driver authors
pcieaer-howto.txt
- the PCI Express Advanced Error Reporting Driver Guide HOWTO
endpoint/pci-endpoint.txt
- guide to add endpoint controller driver and endpoint function driver.
endpoint/pci-endpoint-cfs.txt
- guide to use configfs to configure the PCI endpoint function.
endpoint/pci-test-function.txt
- specification of *PCI test* function device.
endpoint/pci-test-howto.txt
- userguide for PCI endpoint test function.
endpoint/function/binding/
- binding documentation for PCI endpoint function

View File

@ -99,17 +99,20 @@ Note that the devices listed here correspond to the value populated in 1.4 above
2.2 Using Endpoint Test function Device
pcitest.sh added in tools/pci/ can be used to run all the default PCI endpoint
tests. Before pcitest.sh can be used pcitest.c should be compiled using the
following commands.
tests. To compile this tool the following commands should be used:
cd <kernel-dir>
make headers_install ARCH=arm
arm-linux-gnueabihf-gcc -Iusr/include tools/pci/pcitest.c -o pcitest
cp pcitest <rootfs>/usr/sbin/
cp tools/pci/pcitest.sh <rootfs>
# cd <kernel-dir>
# make -C tools/pci
or if you desire to compile and install in your system:
# cd <kernel-dir>
# make -C tools/pci install
The tool and script will be located in <rootfs>/usr/bin/
2.2.1 pcitest.sh Output
# ./pcitest.sh
# pcitest.sh
BAR tests
BAR0: OKAY

View File

@ -110,7 +110,7 @@ The actual steps taken by a platform to recover from a PCI error
event will be platform-dependent, but will follow the general
sequence described below.
STEP 0: Error Event: ERR_NONFATAL
STEP 0: Error Event
-------------------
A PCI bus error is detected by the PCI hardware. On powerpc, the slot
is isolated, in that all I/O is blocked: all reads return 0xffffffff,
@ -228,7 +228,13 @@ proceeds to either STEP3 (Link Reset) or to STEP 5 (Resume Operations).
If any driver returned PCI_ERS_RESULT_NEED_RESET, then the platform
proceeds to STEP 4 (Slot Reset)
STEP 3: Slot Reset
STEP 3: Link Reset
------------------
The platform resets the link. This is a PCI-Express specific step
and is done whenever a fatal error has been detected that can be
"solved" by resetting the link.
STEP 4: Slot Reset
------------------
In response to a return value of PCI_ERS_RESULT_NEED_RESET, the
@ -314,7 +320,7 @@ Failure).
>>> However, it probably should.
STEP 4: Resume Operations
STEP 5: Resume Operations
-------------------------
The platform will call the resume() callback on all affected device
drivers if all drivers on the segment have returned
@ -326,7 +332,7 @@ a result code.
At this point, if a new error happens, the platform will restart
a new error recovery sequence.
STEP 5: Permanent Failure
STEP 6: Permanent Failure
-------------------------
A "permanent failure" has occurred, and the platform cannot recover
the device. The platform will call error_detected() with a
@ -349,27 +355,6 @@ errors. See the discussion in powerpc/eeh-pci-error-recovery.txt
for additional detail on real-life experience of the causes of
software errors.
STEP 0: Error Event: ERR_FATAL
-------------------
PCI bus error is detected by the PCI hardware. On powerpc, the slot is
isolated, in that all I/O is blocked: all reads return 0xffffffff, all
writes are ignored.
STEP 1: Remove devices
--------------------
Platform removes the devices depending on the error agent, it could be
this port for all subordinates or upstream component (likely downstream
port)
STEP 2: Reset link
--------------------
The platform resets the link. This is a PCI-Express specific step and is
done whenever a fatal error has been detected that can be "solved" by
resetting the link.
STEP 3: Re-enumerate the devices
--------------------
Initiates the re-enumeration.
Conclusion; General Remarks
---------------------------

View File

@ -1,34 +0,0 @@
00-INDEX
- This file
arrayRCU.txt
- Using RCU to Protect Read-Mostly Arrays
checklist.txt
- Review Checklist for RCU Patches
listRCU.txt
- Using RCU to Protect Read-Mostly Linked Lists
lockdep.txt
- RCU and lockdep checking
lockdep-splat.txt
- RCU Lockdep splats explained.
NMI-RCU.txt
- Using RCU to Protect Dynamic NMI Handlers
rcu_dereference.txt
- Proper care and feeding of return values from rcu_dereference()
rcubarrier.txt
- RCU and Unloadable Modules
rculist_nulls.txt
- RCU list primitives for use with SLAB_TYPESAFE_BY_RCU
rcuref.txt
- Reference-count design for elements of lists/arrays protected by RCU
rcu.txt
- RCU Concepts
RTFP.txt
- List of RCU papers (bibliography) going back to 1980.
stallwarn.txt
- RCU CPU stall warnings (module parameter rcu_cpu_stall_suppress)
torture.txt
- RCU Torture Test Operation (CONFIG_RCU_TORTURE_TEST)
UP.txt
- RCU on Uniprocessor Systems
whatisRCU.txt
- What is RCU?

View File

@ -1227,9 +1227,11 @@ to overflow the counter, this approach corrects the
CPU enters the idle loop from process context.
</p><p>The <tt>-&gt;dynticks</tt> field counts the corresponding
CPU's transitions to and from dyntick-idle mode, so that this counter
has an even value when the CPU is in dyntick-idle mode and an odd
value otherwise.
CPU's transitions to and from either dyntick-idle or user mode, so
that this counter has an even value when the CPU is in dyntick-idle
mode or user mode and an odd value otherwise. The transitions to/from
user mode need to be counted for user mode adaptive-ticks support
(see timers/NO_HZ.txt).
</p><p>The <tt>-&gt;rcu_need_heavy_qs</tt> field is used
to record the fact that the RCU core code would really like to
@ -1372,8 +1374,7 @@ that is, if the CPU is currently idle.
Accessor Functions</a></h3>
<p>The following listing shows the
<tt>rcu_get_root()</tt>, <tt>rcu_for_each_node_breadth_first</tt>,
<tt>rcu_for_each_nonleaf_node_breadth_first()</tt>, and
<tt>rcu_get_root()</tt>, <tt>rcu_for_each_node_breadth_first</tt> and
<tt>rcu_for_each_leaf_node()</tt> function and macros:
<pre>
@ -1386,13 +1387,9 @@ Accessor Functions</a></h3>
7 for ((rnp) = &amp;(rsp)-&gt;node[0]; \
8 (rnp) &lt; &amp;(rsp)-&gt;node[NUM_RCU_NODES]; (rnp)++)
9
10 #define rcu_for_each_nonleaf_node_breadth_first(rsp, rnp) \
11 for ((rnp) = &amp;(rsp)-&gt;node[0]; \
12 (rnp) &lt; (rsp)-&gt;level[NUM_RCU_LVLS - 1]; (rnp)++)
13
14 #define rcu_for_each_leaf_node(rsp, rnp) \
15 for ((rnp) = (rsp)-&gt;level[NUM_RCU_LVLS - 1]; \
16 (rnp) &lt; &amp;(rsp)-&gt;node[NUM_RCU_NODES]; (rnp)++)
10 #define rcu_for_each_leaf_node(rsp, rnp) \
11 for ((rnp) = (rsp)-&gt;level[NUM_RCU_LVLS - 1]; \
12 (rnp) &lt; &amp;(rsp)-&gt;node[NUM_RCU_NODES]; (rnp)++)
</pre>
<p>The <tt>rcu_get_root()</tt> simply returns a pointer to the
@ -1405,10 +1402,7 @@ macro takes advantage of the layout of the <tt>rcu_node</tt>
structures in the <tt>rcu_state</tt> structure's
<tt>-&gt;node[]</tt> array, performing a breadth-first traversal by
simply traversing the array in order.
The <tt>rcu_for_each_nonleaf_node_breadth_first()</tt> macro operates
similarly, but traverses only the first part of the array, thus excluding
the leaf <tt>rcu_node</tt> structures.
Finally, the <tt>rcu_for_each_leaf_node()</tt> macro traverses only
Similarly, the <tt>rcu_for_each_leaf_node()</tt> macro traverses only
the last part of the array, thus traversing only the leaf
<tt>rcu_node</tt> structures.
@ -1416,15 +1410,14 @@ the last part of the array, thus traversing only the leaf
<tr><th>&nbsp;</th></tr>
<tr><th align="left">Quick Quiz:</th></tr>
<tr><td>
What do <tt>rcu_for_each_nonleaf_node_breadth_first()</tt> and
What does
<tt>rcu_for_each_leaf_node()</tt> do if the <tt>rcu_node</tt> tree
contains only a single node?
</td></tr>
<tr><th align="left">Answer:</th></tr>
<tr><td bgcolor="#ffffff"><font color="ffffff">
In the single-node case,
<tt>rcu_for_each_nonleaf_node_breadth_first()</tt> is a no-op
and <tt>rcu_for_each_leaf_node()</tt> traverses the single node.
<tt>rcu_for_each_leaf_node()</tt> traverses the single node.
</font></td></tr>
<tr><td>&nbsp;</td></tr>
</table>

View File

@ -12,10 +12,9 @@ high efficiency and minimal disturbance, expedited grace periods accept
lower efficiency and significant disturbance to attain shorter latencies.
<p>
There are three flavors of RCU (RCU-bh, RCU-preempt, and RCU-sched),
but only two flavors of expedited grace periods because the RCU-bh
expedited grace period maps onto the RCU-sched expedited grace period.
Each of the remaining two implementations is covered in its own section.
There are two flavors of RCU (RCU-preempt and RCU-sched), with an earlier
third RCU-bh flavor having been implemented in terms of the other two.
Each of the two implementations is covered in its own section.
<ol>
<li> <a href="#Expedited Grace Period Design">
@ -158,7 +157,7 @@ whether or not the current CPU is in an RCU read-side critical section.
The best that <tt>sync_sched_exp_handler()</tt> can do is to check
for idle, on the off-chance that the CPU went idle while the IPI
was in flight.
If the CPU is idle, then tt>sync_sched_exp_handler()</tt> reports
If the CPU is idle, then <tt>sync_sched_exp_handler()</tt> reports
the quiescent state.
<p>

View File

@ -1306,8 +1306,6 @@ doing so would degrade real-time response.
<p>
This non-requirement appeared with preemptible RCU.
If you need a grace period that waits on non-preemptible code regions, use
<a href="#Sched Flavor">RCU-sched</a>.
<h2><a name="Parallelism Facts of Life">Parallelism Facts of Life</a></h2>
@ -2165,14 +2163,9 @@ however, this is not a panacea because there would be severe restrictions
on what operations those callbacks could invoke.
<p>
Perhaps surprisingly, <tt>synchronize_rcu()</tt>,
<a href="#Bottom-Half Flavor"><tt>synchronize_rcu_bh()</tt></a>
(<a href="#Bottom-Half Flavor">discussed below</a>),
<a href="#Sched Flavor"><tt>synchronize_sched()</tt></a>,
Perhaps surprisingly, <tt>synchronize_rcu()</tt> and
<tt>synchronize_rcu_expedited()</tt>,
<tt>synchronize_rcu_bh_expedited()</tt>, and
<tt>synchronize_sched_expedited()</tt>
will all operate normally
will operate normally
during very early boot, the reason being that there is only one CPU
and preemption is disabled.
This means that the call <tt>synchronize_rcu()</tt> (or friends)
@ -2269,12 +2262,23 @@ Thankfully, RCU update-side primitives, including
The name notwithstanding, some Linux-kernel architectures
can have nested NMIs, which RCU must handle correctly.
Andy Lutomirski
<a href="https://lkml.kernel.org/g/CALCETrXLq1y7e_dKFPgou-FKHB6Pu-r8+t-6Ds+8=va7anBWDA@mail.gmail.com">surprised me</a>
<a href="https://lkml.kernel.org/r/CALCETrXLq1y7e_dKFPgou-FKHB6Pu-r8+t-6Ds+8=va7anBWDA@mail.gmail.com">surprised me</a>
with this requirement;
he also kindly surprised me with
<a href="https://lkml.kernel.org/g/CALCETrXSY9JpW3uE6H8WYk81sg56qasA2aqmjMPsq5dOtzso=g@mail.gmail.com">an algorithm</a>
<a href="https://lkml.kernel.org/r/CALCETrXSY9JpW3uE6H8WYk81sg56qasA2aqmjMPsq5dOtzso=g@mail.gmail.com">an algorithm</a>
that meets this requirement.
<p>
Furthermore, NMI handlers can be interrupted by what appear to RCU
to be normal interrupts.
One way that this can happen is for code that directly invokes
<tt>rcu_irq_enter()</tt> and </tt>rcu_irq_exit()</tt> to be called
from an NMI handler.
This astonishing fact of life prompted the current code structure,
which has <tt>rcu_irq_enter()</tt> invoking <tt>rcu_nmi_enter()</tt>
and <tt>rcu_irq_exit()</tt> invoking <tt>rcu_nmi_exit()</tt>.
And yes, I also learned of this requirement the hard way.
<h3><a name="Loadable Modules">Loadable Modules</a></h3>
<p>
@ -2394,30 +2398,9 @@ when invoked from a CPU-hotplug notifier.
<p>
RCU depends on the scheduler, and the scheduler uses RCU to
protect some of its data structures.
This means the scheduler is forbidden from acquiring
the runqueue locks and the priority-inheritance locks
in the middle of an outermost RCU read-side critical section unless either
(1)&nbsp;it releases them before exiting that same
RCU read-side critical section, or
(2)&nbsp;interrupts are disabled across
that entire RCU read-side critical section.
This same prohibition also applies (recursively!) to any lock that is acquired
while holding any lock to which this prohibition applies.
Adhering to this rule prevents preemptible RCU from invoking
<tt>rcu_read_unlock_special()</tt> while either runqueue or
priority-inheritance locks are held, thus avoiding deadlock.
<p>
Prior to v4.4, it was only necessary to disable preemption across
RCU read-side critical sections that acquired scheduler locks.
In v4.4, expedited grace periods started using IPIs, and these
IPIs could force a <tt>rcu_read_unlock()</tt> to take the slowpath.
Therefore, this expedited-grace-period change required disabling of
interrupts, not just preemption.
<p>
For RCU's part, the preemptible-RCU <tt>rcu_read_unlock()</tt>
implementation must be written carefully to avoid similar deadlocks.
The preemptible-RCU <tt>rcu_read_unlock()</tt>
implementation must therefore be written carefully to avoid deadlocks
involving the scheduler's runqueue and priority-inheritance locks.
In particular, <tt>rcu_read_unlock()</tt> must tolerate an
interrupt where the interrupt handler invokes both
<tt>rcu_read_lock()</tt> and <tt>rcu_read_unlock()</tt>.
@ -2426,7 +2409,7 @@ negative nesting levels to avoid destructive recursion via
interrupt handler's use of RCU.
<p>
This pair of mutual scheduler-RCU requirements came as a
This scheduler-RCU requirement came as a
<a href="https://lwn.net/Articles/453002/">complete surprise</a>.
<p>
@ -2437,9 +2420,28 @@ when running context-switch-heavy workloads when built with
<tt>CONFIG_NO_HZ_FULL=y</tt>
<a href="http://www.rdrop.com/users/paulmck/scalability/paper/BareMetal.2015.01.15b.pdf">did come as a surprise [PDF]</a>.
RCU has made good progress towards meeting this requirement, even
for context-switch-have <tt>CONFIG_NO_HZ_FULL=y</tt> workloads,
for context-switch-heavy <tt>CONFIG_NO_HZ_FULL=y</tt> workloads,
but there is room for further improvement.
<p>
In the past, it was forbidden to disable interrupts across an
<tt>rcu_read_unlock()</tt> unless that interrupt-disabled region
of code also included the matching <tt>rcu_read_lock()</tt>.
Violating this restriction could result in deadlocks involving the
scheduler's runqueue and priority-inheritance spinlocks.
This restriction was lifted when interrupt-disabled calls to
<tt>rcu_read_unlock()</tt> started deferring the reporting of
the resulting RCU-preempt quiescent state until the end of that
interrupts-disabled region.
This deferred reporting means that the scheduler's runqueue and
priority-inheritance locks cannot be held while reporting an RCU-preempt
quiescent state, which lifts the earlier restriction, at least from
a deadlock perspective.
Unfortunately, real-time systems using RCU priority boosting may
need this restriction to remain in effect because deferred
quiescent-state reporting also defers deboosting, which in turn
degrades real-time latencies.
<h3><a name="Tracing and RCU">Tracing and RCU</a></h3>
<p>
@ -2850,15 +2852,22 @@ The other four flavors are listed below, with requirements for each
described in a separate section.
<ol>
<li> <a href="#Bottom-Half Flavor">Bottom-Half Flavor</a>
<li> <a href="#Sched Flavor">Sched Flavor</a>
<li> <a href="#Bottom-Half Flavor">Bottom-Half Flavor (Historical)</a>
<li> <a href="#Sched Flavor">Sched Flavor (Historical)</a>
<li> <a href="#Sleepable RCU">Sleepable RCU</a>
<li> <a href="#Tasks RCU">Tasks RCU</a>
<li> <a href="#Waiting for Multiple Grace Periods">
Waiting for Multiple Grace Periods</a>
</ol>
<h3><a name="Bottom-Half Flavor">Bottom-Half Flavor</a></h3>
<h3><a name="Bottom-Half Flavor">Bottom-Half Flavor (Historical)</a></h3>
<p>
The RCU-bh flavor of RCU has since been expressed in terms of
the other RCU flavors as part of a consolidation of the three
flavors into a single flavor.
The read-side API remains, and continues to disable softirq and to
be accounted for by lockdep.
Much of the material in this section is therefore strictly historical
in nature.
<p>
The softirq-disable (AKA &ldquo;bottom-half&rdquo;,
@ -2918,8 +2927,20 @@ includes
<tt>call_rcu_bh()</tt>,
<tt>rcu_barrier_bh()</tt>, and
<tt>rcu_read_lock_bh_held()</tt>.
However, the update-side APIs are now simple wrappers for other RCU
flavors, namely RCU-sched in CONFIG_PREEMPT=n kernels and RCU-preempt
otherwise.
<h3><a name="Sched Flavor">Sched Flavor</a></h3>
<h3><a name="Sched Flavor">Sched Flavor (Historical)</a></h3>
<p>
The RCU-sched flavor of RCU has since been expressed in terms of
the other RCU flavors as part of a consolidation of the three
flavors into a single flavor.
The read-side API remains, and continues to disable preemption and to
be accounted for by lockdep.
Much of the material in this section is therefore strictly historical
in nature.
<p>
Before preemptible RCU, waiting for an RCU grace period had the
@ -3139,94 +3160,14 @@ The tasks-RCU API is quite compact, consisting only of
<tt>call_rcu_tasks()</tt>,
<tt>synchronize_rcu_tasks()</tt>, and
<tt>rcu_barrier_tasks()</tt>.
<h3><a name="Waiting for Multiple Grace Periods">
Waiting for Multiple Grace Periods</a></h3>
<p>
Perhaps you have an RCU protected data structure that is accessed from
RCU read-side critical sections, from softirq handlers, and from
hardware interrupt handlers.
That is three flavors of RCU, the normal flavor, the bottom-half flavor,
and the sched flavor.
How to wait for a compound grace period?
<p>
The best approach is usually to &ldquo;just say no!&rdquo; and
insert <tt>rcu_read_lock()</tt> and <tt>rcu_read_unlock()</tt>
around each RCU read-side critical section, regardless of what
environment it happens to be in.
But suppose that some of the RCU read-side critical sections are
on extremely hot code paths, and that use of <tt>CONFIG_PREEMPT=n</tt>
is not a viable option, so that <tt>rcu_read_lock()</tt> and
<tt>rcu_read_unlock()</tt> are not free.
What then?
<p>
You <i>could</i> wait on all three grace periods in succession, as follows:
<blockquote>
<pre>
1 synchronize_rcu();
2 synchronize_rcu_bh();
3 synchronize_sched();
</pre>
</blockquote>
<p>
This works, but triples the update-side latency penalty.
In cases where this is not acceptable, <tt>synchronize_rcu_mult()</tt>
may be used to wait on all three flavors of grace period concurrently:
<blockquote>
<pre>
1 synchronize_rcu_mult(call_rcu, call_rcu_bh, call_rcu_sched);
</pre>
</blockquote>
<p>
But what if it is necessary to also wait on SRCU?
This can be done as follows:
<blockquote>
<pre>
1 static void call_my_srcu(struct rcu_head *head,
2 void (*func)(struct rcu_head *head))
3 {
4 call_srcu(&amp;my_srcu, head, func);
5 }
6
7 synchronize_rcu_mult(call_rcu, call_rcu_bh, call_rcu_sched, call_my_srcu);
</pre>
</blockquote>
<p>
If you needed to wait on multiple different flavors of SRCU
(but why???), you would need to create a wrapper function resembling
<tt>call_my_srcu()</tt> for each SRCU flavor.
<table>
<tr><th>&nbsp;</th></tr>
<tr><th align="left">Quick Quiz:</th></tr>
<tr><td>
But what if I need to wait for multiple RCU flavors, but I also need
the grace periods to be expedited?
</td></tr>
<tr><th align="left">Answer:</th></tr>
<tr><td bgcolor="#ffffff"><font color="ffffff">
If you are using expedited grace periods, there should be less penalty
for waiting on them in succession.
But if that is nevertheless a problem, you can use workqueues
or multiple kthreads to wait on the various expedited grace
periods concurrently.
</font></td></tr>
<tr><td>&nbsp;</td></tr>
</table>
<p>
Again, it is usually better to adjust the RCU read-side critical sections
to use a single flavor of RCU, but when this is not feasible, you can use
<tt>synchronize_rcu_mult()</tt>.
In <tt>CONFIG_PREEMPT=n</tt> kernels, trampolines cannot be preempted,
so these APIs map to
<tt>call_rcu()</tt>,
<tt>synchronize_rcu()</tt>, and
<tt>rcu_barrier()</tt>, respectively.
In <tt>CONFIG_PREEMPT=y</tt> kernels, trampolines can be preempted,
and these three APIs are therefore implemented by separate functions
that check for voluntary context switches.
<h2><a name="Possible Future Changes">Possible Future Changes</a></h2>
@ -3237,12 +3178,6 @@ If this becomes a serious problem, it will be necessary to rework the
grace-period state machine so as to avoid the need for the additional
latency.
<p>
Expedited grace periods scan the CPUs, so their latency and overhead
increases with increasing numbers of CPUs.
If this becomes a serious problem on large systems, it will be necessary
to do some redesign to avoid this scalability problem.
<p>
RCU disables CPU hotplug in a few places, perhaps most notably in the
<tt>rcu_barrier()</tt> operations.
@ -3287,11 +3222,6 @@ Please note that arrangements that require RCU to remap CPU numbers will
require extremely good demonstration of need and full exploration of
alternatives.
<p>
There is an embarrassingly large number of flavors of RCU, and this
number has been increasing over time.
Perhaps it will be possible to combine some at some future date.
<p>
RCU's various kthreads are reasonably recent additions.
It is quite likely that adjustments will be required to more gracefully

View File

@ -87,7 +87,3 @@ o Where can I find more information on RCU?
See the RTFP.txt file in this directory.
Or point your browser at http://www.rdrop.com/users/paulmck/RCU/.
o What are all these files in this directory?
See 00-INDEX for the list.

View File

@ -16,12 +16,9 @@ o A CPU looping in an RCU read-side critical section.
o A CPU looping with interrupts disabled.
o A CPU looping with preemption disabled. This condition can
result in RCU-sched stalls and, if ksoftirqd is in use, RCU-bh
stalls.
o A CPU looping with preemption disabled.
o A CPU looping with bottom halves disabled. This condition can
result in RCU-sched and RCU-bh stalls.
o A CPU looping with bottom halves disabled.
o For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the kernel
without invoking schedule(). If the looping in the kernel is
@ -87,9 +84,9 @@ o A hardware failure. This is quite unlikely, but has occurred
This resulted in a series of RCU CPU stall warnings, eventually
leading the realization that the CPU had failed.
The RCU, RCU-sched, RCU-bh, and RCU-tasks implementations have CPU stall
warning. Note that SRCU does -not- have CPU stall warnings. Please note
that RCU only detects CPU stalls when there is a grace period in progress.
The RCU, RCU-sched, and RCU-tasks implementations have CPU stall warning.
Note that SRCU does -not- have CPU stall warnings. Please note that
RCU only detects CPU stalls when there is a grace period in progress.
No grace period, no CPU stall warnings.
To diagnose the cause of the stall, inspect the stack traces.

View File

@ -934,7 +934,8 @@ c. Do you need to treat NMI handlers, hardirq handlers,
d. Do you need RCU grace periods to complete even in the face
of softirq monopolization of one or more of the CPUs? For
example, is your code subject to network-based denial-of-service
attacks? If so, you need RCU-bh.
attacks? If so, you should disable softirq across your readers,
for example, by using rcu_read_lock_bh().
e. Is your workload too update-intensive for normal use of
RCU, but inappropriate for other synchronization mechanisms?

View File

@ -0,0 +1,73 @@
================================
PSI - Pressure Stall Information
================================
:Date: April, 2018
:Author: Johannes Weiner <hannes@cmpxchg.org>
When CPU, memory or IO devices are contended, workloads experience
latency spikes, throughput losses, and run the risk of OOM kills.
Without an accurate measure of such contention, users are forced to
either play it safe and under-utilize their hardware resources, or
roll the dice and frequently suffer the disruptions resulting from
excessive overcommit.
The psi feature identifies and quantifies the disruptions caused by
such resource crunches and the time impact it has on complex workloads
or even entire systems.
Having an accurate measure of productivity losses caused by resource
scarcity aids users in sizing workloads to hardware--or provisioning
hardware according to workload demand.
As psi aggregates this information in realtime, systems can be managed
dynamically using techniques such as load shedding, migrating jobs to
other systems or data centers, or strategically pausing or killing low
priority or restartable batch jobs.
This allows maximizing hardware utilization without sacrificing
workload health or risking major disruptions such as OOM kills.
Pressure interface
==================
Pressure information for each resource is exported through the
respective file in /proc/pressure/ -- cpu, memory, and io.
The format for CPU is as such:
some avg10=0.00 avg60=0.00 avg300=0.00 total=0
and for memory and IO:
some avg10=0.00 avg60=0.00 avg300=0.00 total=0
full avg10=0.00 avg60=0.00 avg300=0.00 total=0
The "some" line indicates the share of time in which at least some
tasks are stalled on a given resource.
The "full" line indicates the share of time in which all non-idle
tasks are stalled on a given resource simultaneously. In this state
actual CPU cycles are going to waste, and a workload that spends
extended time in this state is considered to be thrashing. This has
severe impact on performance, and it's useful to distinguish this
situation from a state where some tasks are stalled but the CPU is
still doing productive work. As such, time spent in this subset of the
stall state is tracked separately and exported in the "full" averages.
The ratios are tracked as recent trends over ten, sixty, and three
hundred second windows, which gives insight into short term events as
well as medium and long term trends. The total absolute stall time is
tracked and exported as well, to allow detection of latency spikes
which wouldn't necessarily make a dent in the time averages, or to
average trends over custom time frames.
Cgroup2 interface
=================
In a system with a CONFIG_CGROUP=y kernel and the cgroup2 filesystem
mounted, pressure stall information is also tracked for tasks grouped
into cgroups. Each subdirectory in the cgroupfs mountpoint contains
cpu.pressure, memory.pressure, and io.pressure files; the format is
the same as the /proc/pressure/ files.

View File

@ -64,8 +64,8 @@ The sysctl settings (writable only with ``CAP_SYS_PTRACE``) are:
Using ``PTRACE_TRACEME`` is unchanged.
2 - admin-only attach:
only processes with ``CAP_SYS_PTRACE`` may use ptrace
with ``PTRACE_ATTACH``, or through children calling ``PTRACE_TRACEME``.
only processes with ``CAP_SYS_PTRACE`` may use ptrace, either with
``PTRACE_ATTACH`` or through children calling ``PTRACE_TRACEME``.
3 - no attach:
no processes may use ptrace with ``PTRACE_ATTACH`` nor via

View File

@ -51,8 +51,7 @@ Documentation
- There are various README files in the Documentation/ subdirectory:
these typically contain kernel-specific installation notes for some
drivers for example. See Documentation/00-INDEX for a list of what
is contained in each file. Please read the
drivers for example. Please read the
:ref:`Documentation/process/changes.rst <changes>` file, as it
contains information about the problems, which may result by upgrading
your kernel.

View File

@ -966,6 +966,12 @@ All time durations are in microseconds.
$PERIOD duration. "max" for $MAX indicates no limit. If only
one number is written, $MAX is updated.
cpu.pressure
A read-only nested-key file which exists on non-root cgroups.
Shows pressure stall information for CPU. See
Documentation/accounting/psi.txt for details.
Memory
------
@ -1127,6 +1133,10 @@ PAGE_SIZE multiple when read back.
disk readahead. For now OOM in memory cgroup kills
tasks iff shortage has happened inside page fault.
This event is not raised if the OOM killer is not
considered as an option, e.g. for failed high-order
allocations.
oom_kill
The number of processes belonging to this cgroup
killed by any kind of OOM killer.
@ -1271,6 +1281,12 @@ PAGE_SIZE multiple when read back.
higher than the limit for an extended period of time. This
reduces the impact on the workload and memory management.
memory.pressure
A read-only nested-key file which exists on non-root cgroups.
Shows pressure stall information for memory. See
Documentation/accounting/psi.txt for details.
Usage Guidelines
~~~~~~~~~~~~~~~~
@ -1408,6 +1424,12 @@ IO Interface Files
8:16 rbps=2097152 wbps=max riops=max wiops=max
io.pressure
A read-only nested-key file which exists on non-root cgroups.
Shows pressure stall information for IO. See
Documentation/accounting/psi.txt for details.
Writeback
~~~~~~~~~

View File

@ -0,0 +1,574 @@
.. SPDX-License-Identifier: GPL-2.0
========================
ext4 General Information
========================
Ext4 is an advanced level of the ext3 filesystem which incorporates
scalability and reliability enhancements for supporting large filesystems
(64 bit) in keeping with increasing disk capacities and state-of-the-art
feature requirements.
Mailing list: linux-ext4@vger.kernel.org
Web site: http://ext4.wiki.kernel.org
Quick usage instructions
========================
Note: More extensive information for getting started with ext4 can be
found at the ext4 wiki site at the URL:
http://ext4.wiki.kernel.org/index.php/Ext4_Howto
- The latest version of e2fsprogs can be found at:
https://www.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs/
or
http://sourceforge.net/project/showfiles.php?group_id=2406
or grab the latest git repository from:
https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git
- Create a new filesystem using the ext4 filesystem type:
# mke2fs -t ext4 /dev/hda1
Or to configure an existing ext3 filesystem to support extents:
# tune2fs -O extents /dev/hda1
If the filesystem was created with 128 byte inodes, it can be
converted to use 256 byte for greater efficiency via:
# tune2fs -I 256 /dev/hda1
- Mounting:
# mount -t ext4 /dev/hda1 /wherever
- When comparing performance with other filesystems, it's always
important to try multiple workloads; very often a subtle change in a
workload parameter can completely change the ranking of which
filesystems do well compared to others. When comparing versus ext3,
note that ext4 enables write barriers by default, while ext3 does
not enable write barriers by default. So it is useful to use
explicitly specify whether barriers are enabled or not when via the
'-o barriers=[0|1]' mount option for both ext3 and ext4 filesystems
for a fair comparison. When tuning ext3 for best benchmark numbers,
it is often worthwhile to try changing the data journaling mode; '-o
data=writeback' can be faster for some workloads. (Note however that
running mounted with data=writeback can potentially leave stale data
exposed in recently written files in case of an unclean shutdown,
which could be a security exposure in some situations.) Configuring
the filesystem with a large journal can also be helpful for
metadata-intensive workloads.
Features
========
Currently Available
-------------------
* ability to use filesystems > 16TB (e2fsprogs support not available yet)
* extent format reduces metadata overhead (RAM, IO for access, transactions)
* extent format more robust in face of on-disk corruption due to magics,
* internal redundancy in tree
* improved file allocation (multi-block alloc)
* lift 32000 subdirectory limit imposed by i_links_count[1]
* nsec timestamps for mtime, atime, ctime, create time
* inode version field on disk (NFSv4, Lustre)
* reduced e2fsck time via uninit_bg feature
* journal checksumming for robustness, performance
* persistent file preallocation (e.g for streaming media, databases)
* ability to pack bitmaps and inode tables into larger virtual groups via the
flex_bg feature
* large file support
* inode allocation using large virtual block groups via flex_bg
* delayed allocation
* large block (up to pagesize) support
* efficient new ordered mode in JBD2 and ext4 (avoid using buffer head to force
the ordering)
[1] Filesystems with a block size of 1k may see a limit imposed by the
directory hash tree having a maximum depth of two.
Options
=======
When mounting an ext4 filesystem, the following option are accepted:
(*) == default
ro
Mount filesystem read only. Note that ext4 will replay the journal (and
thus write to the partition) even when mounted "read only". The mount
options "ro,noload" can be used to prevent writes to the filesystem.
journal_checksum
Enable checksumming of the journal transactions. This will allow the
recovery code in e2fsck and the kernel to detect corruption in the
kernel. It is a compatible change and will be ignored by older
kernels.
journal_async_commit
Commit block can be written to disk without waiting for descriptor
blocks. If enabled older kernels cannot mount the device. This will
enable 'journal_checksum' internally.
journal_path=path, journal_dev=devnum
When the external journal device's major/minor numbers have changed,
these options allow the user to specify the new journal location. The
journal device is identified through either its new major/minor numbers
encoded in devnum, or via a path to the device.
norecovery, noload
Don't load the journal on mounting. Note that if the filesystem was
not unmounted cleanly, skipping the journal replay will lead to the
filesystem containing inconsistencies that can lead to any number of
problems.
data=journal
All data are committed into the journal prior to being written into the
main file system. Enabling this mode will disable delayed allocation
and O_DIRECT support.
data=ordered (*)
All data are forced directly out to the main file system prior to its
metadata being committed to the journal.
data=writeback
Data ordering is not preserved, data may be written into the main file
system after its metadata has been committed to the journal.
commit=nrsec (*)
Ext4 can be told to sync all its data and metadata every 'nrsec'
seconds. The default value is 5 seconds. This means that if you lose
your power, you will lose as much as the latest 5 seconds of work (your
filesystem will not be damaged though, thanks to the journaling). This
default value (or any low value) will hurt performance, but it's good
for data-safety. Setting it to 0 will have the same effect as leaving
it at the default (5 seconds). Setting it to very large values will
improve performance.
barrier=<0|1(*)>, barrier(*), nobarrier
This enables/disables the use of write barriers in the jbd code.
barrier=0 disables, barrier=1 enables. This also requires an IO stack
which can support barriers, and if jbd gets an error on a barrier
write, it will disable again with a warning. Write barriers enforce
proper on-disk ordering of journal commits, making volatile disk write
caches safe to use, at some performance penalty. If your disks are
battery-backed in one way or another, disabling barriers may safely
improve performance. The mount options "barrier" and "nobarrier" can
also be used to enable or disable barriers, for consistency with other
ext4 mount options.
inode_readahead_blks=n
This tuning parameter controls the maximum number of inode table blocks
that ext4's inode table readahead algorithm will pre-read into the
buffer cache. The default value is 32 blocks.
nouser_xattr
Disables Extended User Attributes. See the attr(5) manual page for
more information about extended attributes.
noacl
This option disables POSIX Access Control List support. If ACL support
is enabled in the kernel configuration (CONFIG_EXT4_FS_POSIX_ACL), ACL
is enabled by default on mount. See the acl(5) manual page for more
information about acl.
bsddf (*)
Make 'df' act like BSD.
minixdf
Make 'df' act like Minix.
debug
Extra debugging information is sent to syslog.
abort
Simulate the effects of calling ext4_abort() for debugging purposes.
This is normally used while remounting a filesystem which is already
mounted.
errors=remount-ro
Remount the filesystem read-only on an error.
errors=continue
Keep going on a filesystem error.
errors=panic
Panic and halt the machine if an error occurs. (These mount options
override the errors behavior specified in the superblock, which can be
configured using tune2fs)
data_err=ignore(*)
Just print an error message if an error occurs in a file data buffer in
ordered mode.
data_err=abort
Abort the journal if an error occurs in a file data buffer in ordered
mode.
grpid | bsdgroups
New objects have the group ID of their parent.
nogrpid (*) | sysvgroups
New objects have the group ID of their creator.
resgid=n
The group ID which may use the reserved blocks.
resuid=n
The user ID which may use the reserved blocks.
sb=
Use alternate superblock at this location.
quota, noquota, grpquota, usrquota
These options are ignored by the filesystem. They are used only by
quota tools to recognize volumes where quota should be turned on. See
documentation in the quota-tools package for more details
(http://sourceforge.net/projects/linuxquota).
jqfmt=<quota type>, usrjquota=<file>, grpjquota=<file>
These options tell filesystem details about quota so that quota
information can be properly updated during journal replay. They replace
the above quota options. See documentation in the quota-tools package
for more details (http://sourceforge.net/projects/linuxquota).
stripe=n
Number of filesystem blocks that mballoc will try to use for allocation
size and alignment. For RAID5/6 systems this should be the number of
data disks * RAID chunk size in file system blocks.
delalloc (*)
Defer block allocation until just before ext4 writes out the block(s)
in question. This allows ext4 to better allocation decisions more
efficiently.
nodelalloc
Disable delayed allocation. Blocks are allocated when the data is
copied from userspace to the page cache, either via the write(2) system
call or when an mmap'ed page which was previously unallocated is
written for the first time.
max_batch_time=usec
Maximum amount of time ext4 should wait for additional filesystem
operations to be batch together with a synchronous write operation.
Since a synchronous write operation is going to force a commit and then
a wait for the I/O complete, it doesn't cost much, and can be a huge
throughput win, we wait for a small amount of time to see if any other
transactions can piggyback on the synchronous write. The algorithm
used is designed to automatically tune for the speed of the disk, by
measuring the amount of time (on average) that it takes to finish
committing a transaction. Call this time the "commit time". If the
time that the transaction has been running is less than the commit
time, ext4 will try sleeping for the commit time to see if other
operations will join the transaction. The commit time is capped by
the max_batch_time, which defaults to 15000us (15ms). This
optimization can be turned off entirely by setting max_batch_time to 0.
min_batch_time=usec
This parameter sets the commit time (as described above) to be at least
min_batch_time. It defaults to zero microseconds. Increasing this
parameter may improve the throughput of multi-threaded, synchronous
workloads on very fast disks, at the cost of increasing latency.
journal_ioprio=prio
The I/O priority (from 0 to 7, where 0 is the highest priority) which
should be used for I/O operations submitted by kjournald2 during a
commit operation. This defaults to 3, which is a slightly higher
priority than the default I/O priority.
auto_da_alloc(*), noauto_da_alloc
Many broken applications don't use fsync() when replacing existing
files via patterns such as fd = open("foo.new")/write(fd,..)/close(fd)/
rename("foo.new", "foo"), or worse yet, fd = open("foo",
O_TRUNC)/write(fd,..)/close(fd). If auto_da_alloc is enabled, ext4
will detect the replace-via-rename and replace-via-truncate patterns
and force that any delayed allocation blocks are allocated such that at
the next journal commit, in the default data=ordered mode, the data
blocks of the new file are forced to disk before the rename() operation
is committed. This provides roughly the same level of guarantees as
ext3, and avoids the "zero-length" problem that can happen when a
system crashes before the delayed allocation blocks are forced to disk.
noinit_itable
Do not initialize any uninitialized inode table blocks in the
background. This feature may be used by installation CD's so that the
install process can complete as quickly as possible; the inode table
initialization process would then be deferred until the next time the
file system is unmounted.
init_itable=n
The lazy itable init code will wait n times the number of milliseconds
it took to zero out the previous block group's inode table. This
minimizes the impact on the system performance while file system's
inode table is being initialized.
discard, nodiscard(*)
Controls whether ext4 should issue discard/TRIM commands to the
underlying block device when blocks are freed. This is useful for SSD
devices and sparse/thinly-provisioned LUNs, but it is off by default
until sufficient testing has been done.
nouid32
Disables 32-bit UIDs and GIDs. This is for interoperability with
older kernels which only store and expect 16-bit values.
block_validity(*), noblock_validity
These options enable or disable the in-kernel facility for tracking
filesystem metadata blocks within internal data structures. This
allows multi- block allocator and other routines to notice bugs or
corrupted allocation bitmaps which cause blocks to be allocated which
overlap with filesystem metadata blocks.
dioread_lock, dioread_nolock
Controls whether or not ext4 should use the DIO read locking. If the
dioread_nolock option is specified ext4 will allocate uninitialized
extent before buffer write and convert the extent to initialized after
IO completes. This approach allows ext4 code to avoid using inode
mutex, which improves scalability on high speed storages. However this
does not work with data journaling and dioread_nolock option will be
ignored with kernel warning. Note that dioread_nolock code path is only
used for extent-based files. Because of the restrictions this options
comprises it is off by default (e.g. dioread_lock).
max_dir_size_kb=n
This limits the size of directories so that any attempt to expand them
beyond the specified limit in kilobytes will cause an ENOSPC error.
This is useful in memory constrained environments, where a very large
directory can cause severe performance problems or even provoke the Out
Of Memory killer. (For example, if there is only 512mb memory
available, a 176mb directory may seriously cramp the system's style.)
i_version
Enable 64-bit inode version support. This option is off by default.
dax
Use direct access (no page cache). See
Documentation/filesystems/dax.txt. Note that this option is
incompatible with data=journal.
Data Mode
=========
There are 3 different data modes:
* writeback mode
In data=writeback mode, ext4 does not journal data at all. This mode provides
a similar level of journaling as that of XFS, JFS, and ReiserFS in its default
mode - metadata journaling. A crash+recovery can cause incorrect data to
appear in files which were written shortly before the crash. This mode will
typically provide the best ext4 performance.
* ordered mode
In data=ordered mode, ext4 only officially journals metadata, but it logically
groups metadata information related to data changes with the data blocks into
a single unit called a transaction. When it's time to write the new metadata
out to disk, the associated data blocks are written first. In general, this
mode performs slightly slower than writeback but significantly faster than
journal mode.
* journal mode
data=journal mode provides full data and metadata journaling. All new data is
written to the journal first, and then to its final location. In the event of
a crash, the journal can be replayed, bringing both data and metadata into a
consistent state. This mode is the slowest except when data needs to be read
from and written to disk at the same time where it outperforms all others
modes. Enabling this mode will disable delayed allocation and O_DIRECT
support.
/proc entries
=============
Information about mounted ext4 file systems can be found in
/proc/fs/ext4. Each mounted filesystem will have a directory in
/proc/fs/ext4 based on its device name (i.e., /proc/fs/ext4/hdc or
/proc/fs/ext4/dm-0). The files in each per-device directory are shown
in table below.
Files in /proc/fs/ext4/<devname>
mb_groups
details of multiblock allocator buddy cache of free blocks
/sys entries
============
Information about mounted ext4 file systems can be found in
/sys/fs/ext4. Each mounted filesystem will have a directory in
/sys/fs/ext4 based on its device name (i.e., /sys/fs/ext4/hdc or
/sys/fs/ext4/dm-0). The files in each per-device directory are shown
in table below.
Files in /sys/fs/ext4/<devname>:
(see also Documentation/ABI/testing/sysfs-fs-ext4)
delayed_allocation_blocks
This file is read-only and shows the number of blocks that are dirty in
the page cache, but which do not have their location in the filesystem
allocated yet.
inode_goal
Tuning parameter which (if non-zero) controls the goal inode used by
the inode allocator in preference to all other allocation heuristics.
This is intended for debugging use only, and should be 0 on production
systems.
inode_readahead_blks
Tuning parameter which controls the maximum number of inode table
blocks that ext4's inode table readahead algorithm will pre-read into
the buffer cache.
lifetime_write_kbytes
This file is read-only and shows the number of kilobytes of data that
have been written to this filesystem since it was created.
max_writeback_mb_bump
The maximum number of megabytes the writeback code will try to write
out before move on to another inode.
mb_group_prealloc
The multiblock allocator will round up allocation requests to a
multiple of this tuning parameter if the stripe size is not set in the
ext4 superblock
mb_max_to_scan
The maximum number of extents the multiblock allocator will search to
find the best extent.
mb_min_to_scan
The minimum number of extents the multiblock allocator will search to
find the best extent.
mb_order2_req
Tuning parameter which controls the minimum size for requests (as a
power of 2) where the buddy cache is used.
mb_stats
Controls whether the multiblock allocator should collect statistics,
which are shown during the unmount. 1 means to collect statistics, 0
means not to collect statistics.
mb_stream_req
Files which have fewer blocks than this tunable parameter will have
their blocks allocated out of a block group specific preallocation
pool, so that small files are packed closely together. Each large file
will have its blocks allocated out of its own unique preallocation
pool.
session_write_kbytes
This file is read-only and shows the number of kilobytes of data that
have been written to this filesystem since it was mounted.
reserved_clusters
This is RW file and contains number of reserved clusters in the file
system which will be used in the specific situations to avoid costly
zeroout, unexpected ENOSPC, or possible data loss. The default is 2% or
4096 clusters, whichever is smaller and this can be changed however it
can never exceed number of clusters in the file system. If there is not
enough space for the reserved space when mounting the file mount will
_not_ fail.
Ioctls
======
There is some Ext4 specific functionality which can be accessed by applications
through the system call interfaces. The list of all Ext4 specific ioctls are
shown in the table below.
Table of Ext4 specific ioctls
EXT4_IOC_GETFLAGS
Get additional attributes associated with inode. The ioctl argument is
an integer bitfield, with bit values described in ext4.h. This ioctl is
an alias for FS_IOC_GETFLAGS.
EXT4_IOC_SETFLAGS
Set additional attributes associated with inode. The ioctl argument is
an integer bitfield, with bit values described in ext4.h. This ioctl is
an alias for FS_IOC_SETFLAGS.
EXT4_IOC_GETVERSION, EXT4_IOC_GETVERSION_OLD
Get the inode i_generation number stored for each inode. The
i_generation number is normally changed only when new inode is created
and it is particularly useful for network filesystems. The '_OLD'
version of this ioctl is an alias for FS_IOC_GETVERSION.
EXT4_IOC_SETVERSION, EXT4_IOC_SETVERSION_OLD
Set the inode i_generation number stored for each inode. The '_OLD'
version of this ioctl is an alias for FS_IOC_SETVERSION.
EXT4_IOC_GROUP_EXTEND
This ioctl has the same purpose as the resize mount option. It allows
to resize filesystem to the end of the last existing block group,
further resize has to be done with resize2fs, either online, or
offline. The argument points to the unsigned logn number representing
the filesystem new block count.
EXT4_IOC_MOVE_EXT
Move the block extents from orig_fd (the one this ioctl is pointing to)
to the donor_fd (the one specified in move_extent structure passed as
an argument to this ioctl). Then, exchange inode metadata between
orig_fd and donor_fd. This is especially useful for online
defragmentation, because the allocator has the opportunity to allocate
moved blocks better, ideally into one contiguous extent.
EXT4_IOC_GROUP_ADD
Add a new group descriptor to an existing or new group descriptor
block. The new group descriptor is described by ext4_new_group_input
structure, which is passed as an argument to this ioctl. This is
especially useful in conjunction with EXT4_IOC_GROUP_EXTEND, which
allows online resize of the filesystem to the end of the last existing
block group. Those two ioctls combined is used in userspace online
resize tool (e.g. resize2fs).
EXT4_IOC_MIGRATE
This ioctl operates on the filesystem itself. It converts (migrates)
ext3 indirect block mapped inode to ext4 extent mapped inode by walking
through indirect block mapping of the original inode and converting
contiguous block ranges into ext4 extents of the temporary inode. Then,
inodes are swapped. This ioctl might help, when migrating from ext3 to
ext4 filesystem, however suggestion is to create fresh ext4 filesystem
and copy data from the backup. Note, that filesystem has to support
extents for this ioctl to work.
EXT4_IOC_ALLOC_DA_BLKS
Force all of the delay allocated blocks to be allocated to preserve
application-expected ext3 behaviour. Note that this will also start
triggering a write of the data blocks, but this behaviour may change in
the future as it is not necessary and has been done this way only for
sake of simplicity.
EXT4_IOC_RESIZE_FS
Resize the filesystem to a new size. The number of blocks of resized
filesystem is passed in via 64 bit integer argument. The kernel
allocates bitmaps and inode table, the userspace tool thus just passes
the new number of blocks.
EXT4_IOC_SWAP_BOOT
Swap i_blocks and associated attributes (like i_blocks, i_size,
i_flags, ...) from the specified inode with inode EXT4_BOOT_LOADER_INO
(#5). This is typically used to store a boot loader in a secure part of
the filesystem, where it can't be changed by a normal user by accident.
The data blocks of the previous boot loader will be associated with the
given inode.
References
==========
kernel source: <file:fs/ext4/>
<file:fs/jbd2/>
programs: http://e2fsprogs.sourceforge.net/
useful links: http://fedoraproject.org/wiki/ext3-devel
http://www.bullopensource.org/ext4/
http://ext4.wiki.kernel.org/index.php/Main_Page
http://fedoraproject.org/wiki/Features/Ext4

View File

@ -71,6 +71,7 @@ configure specific aspects of kernel behavior to your liking.
java
ras
bcache
ext4
pm/index
thunderbolt
LSM/index

View File

@ -856,6 +856,11 @@
causing system reset or hang due to sending
INIT from AP to BSP.
disable_counter_freezing [HW]
Disable Intel PMU counter freezing feature.
The feature only exists starting from
Arch Perfmon v4 (Skylake and newer).
disable_ddw [PPC/PSERIES]
Disable Dynamic DMA Window support. Use this if
to workaround buggy firmware.
@ -1063,7 +1068,7 @@
earlyprintk=serial[,0x...[,baudrate]]
earlyprintk=ttySn[,baudrate]
earlyprintk=dbgp[debugController#]
earlyprintk=pciserial,bus:device.function[,baudrate]
earlyprintk=pciserial[,force],bus:device.function[,baudrate]
earlyprintk=xdbc[xhciController#]
earlyprintk is useful when the kernel crashes before
@ -1095,6 +1100,10 @@
The sclp output can only be used on s390.
The optional "force" to "pciserial" enables use of a
PCI device even when its classcode is not of the
UART class.
edac_report= [HW,EDAC] Control how to report EDAC event
Format: {"on" | "off" | "force"}
on: enable EDAC to report H/W event. May be overridden
@ -1385,6 +1394,11 @@
hvc_iucv_allow= [S390] Comma-separated list of z/VM user IDs.
If specified, z/VM IUCV HVC accepts connections
from listed z/VM user IDs only.
hv_nopvspin [X86,HYPER_V] Disables the paravirt spinlock optimizations
which allow the hypervisor to 'idle' the
guest on lock contention.
keep_bootcon [KNL]
Do not unregister boot console at start. This is only
useful for debugging when something happens in the window
@ -1749,12 +1763,24 @@
nobypass [PPC/POWERNV]
Disable IOMMU bypass, using IOMMU for PCI devices.
iommu.strict= [ARM64] Configure TLB invalidation behaviour
Format: { "0" | "1" }
0 - Lazy mode.
Request that DMA unmap operations use deferred
invalidation of hardware TLBs, for increased
throughput at the cost of reduced device isolation.
Will fall back to strict mode if not supported by
the relevant IOMMU driver.
1 - Strict mode (default).
DMA unmap operations invalidate IOMMU hardware TLBs
synchronously.
iommu.passthrough=
[ARM64] Configure DMA to bypass the IOMMU by default.
Format: { "0" | "1" }
0 - Use IOMMU translation for DMA.
1 - Bypass the IOMMU for DMA.
unset - Use IOMMU translation for DMA.
unset - Use value of CONFIG_IOMMU_DEFAULT_PASSTHROUGH.
io7= [HW] IO7 for Marvel based alpha systems
See comment before marvel_specify_io7 in
@ -2274,6 +2300,8 @@
ltpc= [NET]
Format: <io>,<irq>,<dma>
lsm.debug [SECURITY] Enable LSM initialization debugging output.
machvec= [IA-64] Force the use of a particular machine-vector
(machvec) in a generic kernel.
Example: machvec=hpzx1_swiotlb
@ -2404,7 +2432,7 @@
seconds. Use this parameter to check at some
other rate. 0 disables periodic checking.
memtest= [KNL,X86,ARM] Enable memtest
memtest= [KNL,X86,ARM,PPC] Enable memtest
Format: <integer>
default : 0 <disable>
Specifies the number of memtest passes to be
@ -3540,14 +3568,14 @@
In kernels built with CONFIG_RCU_NOCB_CPU=y, set
the specified list of CPUs to be no-callback CPUs.
Invocation of these CPUs' RCU callbacks will
be offloaded to "rcuox/N" kthreads created for
that purpose, where "x" is "b" for RCU-bh, "p"
for RCU-preempt, and "s" for RCU-sched, and "N"
is the CPU number. This reduces OS jitter on the
offloaded CPUs, which can be useful for HPC and
real-time workloads. It can also improve energy
efficiency for asymmetric multiprocessors.
Invocation of these CPUs' RCU callbacks will be
offloaded to "rcuox/N" kthreads created for that
purpose, where "x" is "p" for RCU-preempt, and
"s" for RCU-sched, and "N" is the CPU number.
This reduces OS jitter on the offloaded CPUs,
which can be useful for HPC and real-time
workloads. It can also improve energy efficiency
for asymmetric multiprocessors.
rcu_nocb_poll [KNL]
Rather than requiring that offloaded CPUs
@ -3601,7 +3629,14 @@
Set required age in jiffies for a
given grace period before RCU starts
soliciting quiescent-state help from
rcu_note_context_switch().
rcu_note_context_switch(). If not specified, the
kernel will calculate a value based on the most
recent settings of rcutree.jiffies_till_first_fqs
and rcutree.jiffies_till_next_fqs.
This calculated value may be viewed in
rcutree.jiffies_to_sched_qs. Any attempt to
set rcutree.jiffies_to_sched_qs will be
cheerfully overwritten.
rcutree.jiffies_till_first_fqs= [KNL]
Set delay from grace-period initialization to
@ -3869,12 +3904,6 @@
rcupdate.rcu_self_test= [KNL]
Run the RCU early boot self tests
rcupdate.rcu_self_test_bh= [KNL]
Run the RCU bh early boot self tests
rcupdate.rcu_self_test_sched= [KNL]
Run the RCU sched early boot self tests
rdinit= [KNL]
Format: <full_path>
Run specified binary instead of /init from the ramdisk,
@ -4610,7 +4639,8 @@
usbcore.old_scheme_first=
[USB] Start with the old device initialization
scheme (default 0 = off).
scheme, applies only to low and full-speed devices
(default 0 = off).
usbcore.usbfs_memory_mb=
[USB] Memory limit (in MB) for buffers allocated by
@ -4825,6 +4855,18 @@
This is actually a boot loader parameter; the value is
passed to the kernel using a special protocol.
vm_debug[=options] [KNL] Available with CONFIG_DEBUG_VM=y.
May slow down system boot speed, especially when
enabled on systems with a large amount of memory.
All options are enabled by default, and this
interface is meant to allow for selectively
enabling or disabling specific virtual memory
debugging features.
Available options are:
P Enable page structure init time poisoning
- Disable all of the above options
vmalloc=nn[KMG] [KNL,BOOT] Forces the vmalloc area to have an exact
size of <nn>. This can be used to increase the
minimum size (128MB on x86). It can also be used to

View File

@ -553,7 +553,7 @@ When nested virtualization is in use, three operating systems are involved:
the bare metal hypervisor, the nested hypervisor and the nested virtual
machine. VMENTER operations from the nested hypervisor into the nested
guest will always be processed by the bare metal hypervisor. If KVM is the
bare metal hypervisor it wiil:
bare metal hypervisor it will:
- Flush the L1D cache on every switch from the nested hypervisor to the
nested virtual machine, so that the nested hypervisor's secrets are not

View File

@ -29,6 +29,7 @@ the Linux memory management.
hugetlbpage
idle_page_tracking
ksm
memory-hotplug
numa_memory_policy
pagemap
soft-dirty

View File

@ -1,47 +1,29 @@
.. _admin_guide_memory_hotplug:
==============
Memory Hotplug
==============
:Created: Jul 28 2007
:Updated: Add description of notifier of memory hotplug: Oct 11 2007
:Updated: Add some details about locking internals: Aug 20 2018
This document is about memory hotplug including how-to-use and current status.
Because Memory Hotplug is still under development, contents of this text will
be changed often.
.. CONTENTS
1. Introduction
1.1 purpose of memory hotplug
1.2. Phases of memory hotplug
1.3. Unit of Memory online/offline operation
2. Kernel Configuration
3. sysfs files for memory hotplug
4. Physical memory hot-add phase
4.1 Hardware(Firmware) Support
4.2 Notify memory hot-add event by hand
5. Logical Memory hot-add phase
5.1. State of memory
5.2. How to online memory
6. Logical memory remove
6.1 Memory offline and ZONE_MOVABLE
6.2. How to offline memory
7. Physical memory remove
8. Memory hotplug event notifier
9. Future Work List
.. contents:: :local:
.. note::
(1) x86_64's has special implementation for memory hotplug.
This text does not describe it.
(2) This text assumes that sysfs is mounted at /sys.
(2) This text assumes that sysfs is mounted at ``/sys``.
Introduction
============
purpose of memory hotplug
Purpose of memory hotplug
-------------------------
Memory Hotplug allows users to increase/decrease the amount of memory.
@ -57,7 +39,6 @@ hardware which supports memory power management.
Linux memory hotplug is designed for both purpose.
Phases of memory hotplug
------------------------
@ -92,7 +73,6 @@ phase by hand.
(However, if you writes udev's hotplug scripts for memory hotplug, these
phases can be execute in seamless way.)
Unit of Memory online/offline operation
---------------------------------------
@ -107,10 +87,9 @@ unit upon which memory online/offline operations are to be performed. The
default size of a memory block is the same as memory section size unless an
architecture specifies otherwise. (see :ref:`memory_hotplug_sysfs_files`.)
To determine the size (in bytes) of a memory block please read this file:
/sys/devices/system/memory/block_size_bytes
To determine the size (in bytes) of a memory block please read this file::
/sys/devices/system/memory/block_size_bytes
Kernel Configuration
====================
@ -119,22 +98,22 @@ To use memory hotplug feature, kernel must be compiled with following
config options.
- For all memory hotplug:
- Memory model -> Sparse Memory (CONFIG_SPARSEMEM)
- Allow for memory hot-add (CONFIG_MEMORY_HOTPLUG)
- Memory model -> Sparse Memory (``CONFIG_SPARSEMEM``)
- Allow for memory hot-add (``CONFIG_MEMORY_HOTPLUG``)
- To enable memory removal, the following are also necessary:
- Allow for memory hot remove (CONFIG_MEMORY_HOTREMOVE)
- Page Migration (CONFIG_MIGRATION)
- Allow for memory hot remove (``CONFIG_MEMORY_HOTREMOVE``)
- Page Migration (``CONFIG_MIGRATION``)
- For ACPI memory hotplug, the following are also necessary:
- Memory hotplug (under ACPI Support menu) (CONFIG_ACPI_HOTPLUG_MEMORY)
- Memory hotplug (under ACPI Support menu) (``CONFIG_ACPI_HOTPLUG_MEMORY``)
- This option can be kernel module.
- As a related configuration, if your box has a feature of NUMA-node hotplug
via ACPI, then this option is necessary too.
- ACPI0004,PNP0A05 and PNP0A06 Container Driver (under ACPI Support menu)
(CONFIG_ACPI_CONTAINER).
(``CONFIG_ACPI_CONTAINER``).
This option can be kernel module too.
@ -145,10 +124,11 @@ sysfs files for memory hotplug
==============================
All memory blocks have their device information in sysfs. Each memory block
is described under /sys/devices/system/memory as:
is described under ``/sys/devices/system/memory`` as::
/sys/devices/system/memory/memoryXXX
(XXX is the memory block id.)
where XXX is the memory block id.
For the memory block covered by the sysfs directory. It is expected that all
memory sections in this range are present and no memory holes exist in the
@ -157,7 +137,7 @@ the existence of one should not affect the hotplug capabilities of the memory
block.
For example, assume 1GiB memory block size. A device for a memory starting at
0x100000000 is /sys/device/system/memory/memory4::
0x100000000 is ``/sys/device/system/memory/memory4``::
(0x100000000 / 1Gib = 4)
@ -165,11 +145,11 @@ This device covers address range [0x100000000 ... 0x140000000)
Under each memory block, you can see 5 files:
- /sys/devices/system/memory/memoryXXX/phys_index
- /sys/devices/system/memory/memoryXXX/phys_device
- /sys/devices/system/memory/memoryXXX/state
- /sys/devices/system/memory/memoryXXX/removable
- /sys/devices/system/memory/memoryXXX/valid_zones
- ``/sys/devices/system/memory/memoryXXX/phys_index``
- ``/sys/devices/system/memory/memoryXXX/phys_device``
- ``/sys/devices/system/memory/memoryXXX/state``
- ``/sys/devices/system/memory/memoryXXX/removable``
- ``/sys/devices/system/memory/memoryXXX/valid_zones``
=================== ============================================================
``phys_index`` read-only and contains memory block id, same as XXX.
@ -207,13 +187,15 @@ Under each memory block, you can see 5 files:
These directories/files appear after physical memory hotplug phase.
If CONFIG_NUMA is enabled the memoryXXX/ directories can also be accessed
via symbolic links located in the /sys/devices/system/node/node* directories.
via symbolic links located in the ``/sys/devices/system/node/node*`` directories.
For example:
/sys/devices/system/node/node0/memory9 -> ../../memory/memory9
For example::
A backlink will also be created:
/sys/devices/system/memory/memory9/node0 -> ../../node/node0
/sys/devices/system/node/node0/memory9 -> ../../memory/memory9
A backlink will also be created::
/sys/devices/system/memory/memory9/node0 -> ../../node/node0
.. _memory_hotplug_physical_mem:
@ -240,7 +222,6 @@ If firmware supports NUMA-node hotplug, and defines an object _HID "ACPI0004",
calls hotplug code for all of objects which are defined in it.
If memory device is found, memory hotplug code will be called.
Notify memory hot-add event by hand
-----------------------------------
@ -251,8 +232,9 @@ CONFIG_ARCH_MEMORY_PROBE and can be configured on powerpc, sh, and x86
if hotplug is supported, although for x86 this should be handled by ACPI
notification.
Probe interface is located at
/sys/devices/system/memory/probe
Probe interface is located at::
/sys/devices/system/memory/probe
You can tell the physical address of new memory to the kernel by::
@ -263,7 +245,6 @@ memory_block_size] memory range is hot-added. In this case, hotplug script is
not called (in current implementation). You'll have to online memory by
yourself. Please see :ref:`memory_hotplug_how_to_online_memory`.
Logical Memory hot-add phase
============================
@ -301,7 +282,7 @@ This sets a global policy and impacts all memory blocks that will subsequently
be hotplugged. Currently offline blocks keep their state. It is possible, under
certain circumstances, that some memory blocks will be added but will fail to
online. User space tools can check their "state" files
(/sys/devices/system/memory/memoryXXX/state) and try to online them manually.
(``/sys/devices/system/memory/memoryXXX/state``) and try to online them manually.
If the automatic onlining wasn't requested, failed, or some memory block was
offlined it is possible to change the individual block's state by writing to the
@ -334,8 +315,6 @@ available memory will be increased.
This may be changed in future.
Logical memory remove
=====================
@ -413,87 +392,45 @@ Need more implementation yet....
- Notification completion of remove works by OS to firmware.
- Guard from remove if not yet.
Memory hotplug event notifier
=============================
Hotplugging events are sent to a notification queue.
Locking Internals
=================
There are six types of notification defined in include/linux/memory.h:
When adding/removing memory that uses memory block devices (i.e. ordinary RAM),
the device_hotplug_lock should be held to:
MEM_GOING_ONLINE
Generated before new memory becomes available in order to be able to
prepare subsystems to handle memory. The page allocator is still unable
to allocate from the new memory.
- synchronize against online/offline requests (e.g. via sysfs). This way, memory
block devices can only be accessed (.online/.state attributes) by user
space once memory has been fully added. And when removing memory, we
know nobody is in critical sections.
- synchronize against CPU hotplug and similar (e.g. relevant for ACPI and PPC)
MEM_CANCEL_ONLINE
Generated if MEMORY_GOING_ONLINE fails.
Especially, there is a possible lock inversion that is avoided using
device_hotplug_lock when adding memory and user space tries to online that
memory faster than expected:
MEM_ONLINE
Generated when memory has successfully brought online. The callback may
allocate pages from the new memory.
- device_online() will first take the device_lock(), followed by
mem_hotplug_lock
- add_memory_resource() will first take the mem_hotplug_lock, followed by
the device_lock() (while creating the devices, during bus_add_device()).
MEM_GOING_OFFLINE
Generated to begin the process of offlining memory. Allocations are no
longer possible from the memory but some of the memory to be offlined
is still in use. The callback can be used to free memory known to a
subsystem from the indicated memory block.
As the device is visible to user space before taking the device_lock(), this
can result in a lock inversion.
MEM_CANCEL_OFFLINE
Generated if MEMORY_GOING_OFFLINE fails. Memory is available again from
the memory block that we attempted to offline.
onlining/offlining of memory should be done via device_online()/
device_offline() - to make sure it is properly synchronized to actions
via sysfs. Holding device_hotplug_lock is advised (to e.g. protect online_type)
MEM_OFFLINE
Generated after offlining memory is complete.
When adding/removing/onlining/offlining memory or adding/removing
heterogeneous/device memory, we should always hold the mem_hotplug_lock in
write mode to serialise memory hotplug (e.g. access to global/zone
variables).
A callback routine can be registered by calling::
In addition, mem_hotplug_lock (in contrast to device_hotplug_lock) in read
mode allows for a quite efficient get_online_mems/put_online_mems
implementation, so code accessing memory can protect from that memory
vanishing.
hotplug_memory_notifier(callback_func, priority)
Callback functions with higher values of priority are called before callback
functions with lower values.
A callback function must have the following prototype::
int callback_func(
struct notifier_block *self, unsigned long action, void *arg);
The first argument of the callback function (self) is a pointer to the block
of the notifier chain that points to the callback function itself.
The second argument (action) is one of the event types described above.
The third argument (arg) passes a pointer of struct memory_notify::
struct memory_notify {
unsigned long start_pfn;
unsigned long nr_pages;
int status_change_nid_normal;
int status_change_nid_high;
int status_change_nid;
}
- start_pfn is start_pfn of online/offline memory.
- nr_pages is # of pages of online/offline memory.
- status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask
is (will be) set/clear, if this is -1, then nodemask status is not changed.
- status_change_nid_high is set node id when N_HIGH_MEMORY of nodemask
is (will be) set/clear, if this is -1, then nodemask status is not changed.
- status_change_nid is set node id when N_MEMORY of nodemask is (will be)
set/clear. It means a new(memoryless) node gets new memory by online and a
node loses all memory. If this is -1, then nodemask status is not changed.
If status_changed_nid* >= 0, callback should create/discard structures for the
node if necessary.
The callback routine shall return one of the values
NOTIFY_DONE, NOTIFY_OK, NOTIFY_BAD, NOTIFY_STOP
defined in include/linux/notifier.h
NOTIFY_DONE and NOTIFY_OK have no effect on the further processing.
NOTIFY_BAD is used as response to the MEM_GOING_ONLINE, MEM_GOING_OFFLINE,
MEM_ONLINE, or MEM_OFFLINE action to cancel hotplugging. It stops
further processing of the notification queue.
NOTIFY_STOP stops further processing of the notification queue.
Future Work
===========

View File

@ -465,6 +465,13 @@ Next, the following policy attributes have special meaning if
policy for the time interval between the last two invocations of the
driver's utilization update callback by the CPU scheduler for that CPU.
One more policy attribute is present if the `HWP feature is enabled in the
processor <Active Mode With HWP_>`_:
``base_frequency``
Shows the base frequency of the CPU. Any frequency above this will be
in the turbo frequency range.
The meaning of these attributes in the `passive mode <Passive Mode_>`_ is the
same as for other scaling drivers.

View File

@ -26,23 +26,34 @@ information is helpful. Any exploit code is very helpful and will not
be released without consent from the reporter unless it has already been
made public.
Disclosure
----------
Disclosure and embargoed information
------------------------------------
The goal of the Linux kernel security team is to work with the bug
submitter to understand and fix the bug. We prefer to publish the fix as
soon as possible, but try to avoid public discussion of the bug itself
and leave that to others.
The security list is not a disclosure channel. For that, see Coordination
below.
Publishing the fix may be delayed when the bug or the fix is not yet
fully understood, the solution is not well-tested or for vendor
coordination. However, we expect these delays to be short, measurable in
days, not weeks or months. A release date is negotiated by the security
team working with the bug submitter as well as vendors. However, the
kernel security team holds the final say when setting a timeframe. The
timeframe varies from immediate (esp. if it's already publicly known bug)
to a few weeks. As a basic default policy, we expect report date to
release date to be on the order of 7 days.
Once a robust fix has been developed, our preference is to release the
fix in a timely fashion, treating it no differently than any of the other
thousands of changes and fixes the Linux kernel project releases every
month.
However, at the request of the reporter, we will postpone releasing the
fix for up to 5 business days after the date of the report or after the
embargo has lifted; whichever comes first. The only exception to that
rule is if the bug is publicly known, in which case the preference is to
release the fix as soon as it's available.
Whilst embargoed information may be shared with trusted individuals in
order to develop a fix, such information will not be published alongside
the fix or on any other disclosure channel without the permission of the
reporter. This includes but is not limited to the original bug report
and followup discussions (if any), exploits, CVE information or the
identity of the reporter.
In other words our only interest is in getting bugs fixed. All other
information submitted to the security list and any followup discussions
of the report are treated confidentially even after the embargo has been
lifted, in perpetuity.
Coordination
------------
@ -68,7 +79,7 @@ may delay the bug handling. If a reporter wishes to have a CVE identifier
assigned ahead of public disclosure, they will need to contact the private
linux-distros list, described above. When such a CVE identifier is known
before a patch is provided, it is desirable to mention it in the commit
message, though.
message if the reporter agrees.
Non-disclosure agreements
-------------------------

View File

@ -1,50 +0,0 @@
00-INDEX
- this file
Booting
- requirements for booting
CCN.txt
- Cache Coherent Network ring-bus and perf PMU driver.
Interrupts
- ARM Interrupt subsystem documentation
IXP4xx
- Intel IXP4xx Network processor.
Netwinder
- Netwinder specific documentation
Porting
- Symbol definitions for porting Linux to a new ARM machine.
Setup
- Kernel initialization parameters on ARM Linux
README
- General ARM documentation
SA1100/
- SA1100 documentation
Samsung-S3C24XX/
- S3C24XX ARM Linux Overview
SPEAr/
- ST SPEAr platform Linux Overview
VFP/
- Release notes for Linux Kernel Vector Floating Point support code
cluster-pm-race-avoidance.txt
- Algorithm for CPU and Cluster setup/teardown
empeg/
- Ltd's Empeg MP3 Car Audio Player
firmware.txt
- Secure firmware registration and calling.
kernel_mode_neon.txt
- How to use NEON instructions in kernel mode
kernel_user_helpers.txt
- Helper functions in kernel space made available for userspace.
mem_alignment
- alignment abort handler documentation
memory.txt
- description of the virtual memory layout
nwfpe/
- NWFPE floating point emulator documentation
swp_emulation
- SWP/SWPB emulation handler/logging description
tcm.txt
- ARM Tightly Coupled Memory
uefi.txt
- [U]EFI configuration and runtime services documentation
vlocks.txt
- Voting locks, low-level mechanism relying on memory system atomic writes.

View File

@ -26,6 +26,7 @@ Offset Value Purpose
0x20 0xfcba0d10 (Magic cookie) AFTR
0x24 exynos_cpu_resume_ns AFTR
0x28 + 4*cpu 0x8 (Magic cookie, Exynos3250) AFTR
0x28 0x0 or last value during resume (Exynos542x) System suspend
2. Secure mode

View File

@ -78,11 +78,11 @@ HWCAP_EVTSTRM
HWCAP_AES
Functionality implied by ID_AA64ISAR1_EL1.AES == 0b0001.
Functionality implied by ID_AA64ISAR0_EL1.AES == 0b0001.
HWCAP_PMULL
Functionality implied by ID_AA64ISAR1_EL1.AES == 0b0010.
Functionality implied by ID_AA64ISAR0_EL1.AES == 0b0010.
HWCAP_SHA1
@ -153,7 +153,7 @@ HWCAP_ASIMDDP
HWCAP_SHA512
Functionality implied by ID_AA64ISAR0_EL1.SHA2 == 0b0002.
Functionality implied by ID_AA64ISAR0_EL1.SHA2 == 0b0010.
HWCAP_SVE
@ -173,8 +173,12 @@ HWCAP_USCAT
HWCAP_ILRCPC
Functionality implied by ID_AA64ISR1_EL1.LRCPC == 0b0002.
Functionality implied by ID_AA64ISAR1_EL1.LRCPC == 0b0010.
HWCAP_FLAGM
Functionality implied by ID_AA64ISAR0_EL1.TS == 0b0001.
HWCAP_SSBS
Functionality implied by ID_AA64PFR1_EL1.SSBS == 0b0010.

View File

@ -0,0 +1,38 @@
HugeTLBpage on ARM64
====================
Hugepage relies on making efficient use of TLBs to improve performance of
address translations. The benefit depends on both -
- the size of hugepages
- size of entries supported by the TLBs
The ARM64 port supports two flavours of hugepages.
1) Block mappings at the pud/pmd level
--------------------------------------
These are regular hugepages where a pmd or a pud page table entry points to a
block of memory. Regardless of the supported size of entries in TLB, block
mappings reduce the depth of page table walk needed to translate hugepage
addresses.
2) Using the Contiguous bit
---------------------------
The architecture provides a contiguous bit in the translation table entries
(D4.5.3, ARM DDI 0487C.a) that hints to the MMU to indicate that it is one of a
contiguous set of entries that can be cached in a single TLB entry.
The contiguous bit is used in Linux to increase the mapping size at the pmd and
pte (last) level. The number of supported contiguous entries varies by page size
and level of the page table.
The following hugepage sizes are supported -
CONT PTE PMD CONT PMD PUD
-------- --- -------- ---
4K: 64K 2M 32M 1G
16K: 2M 32M 1G
64K: 2M 512M 16G

View File

@ -56,6 +56,7 @@ stable kernels.
| ARM | Cortex-A72 | #853709 | N/A |
| ARM | Cortex-A73 | #858921 | ARM64_ERRATUM_858921 |
| ARM | Cortex-A55 | #1024718 | ARM64_ERRATUM_1024718 |
| ARM | Cortex-A76 | #1188873 | ARM64_ERRATUM_1188873 |
| ARM | MMU-500 | #841119,#826419 | N/A |
| | | | |
| Cavium | ThunderX ITS | #22375, #24313 | CAVIUM_ERRATUM_22375 |

View File

@ -1,34 +0,0 @@
00-INDEX
- This file
bfq-iosched.txt
- BFQ IO scheduler and its tunables
biodoc.txt
- Notes on the Generic Block Layer Rewrite in Linux 2.5
biovecs.txt
- Immutable biovecs and biovec iterators
capability.txt
- Generic Block Device Capability (/sys/block/<device>/capability)
cfq-iosched.txt
- CFQ IO scheduler tunables
cmdline-partition.txt
- how to specify block device partitions on kernel command line
data-integrity.txt
- Block data integrity
deadline-iosched.txt
- Deadline IO scheduler tunables
ioprio.txt
- Block io priorities (in CFQ scheduler)
pr.txt
- Block layer support for Persistent Reservations
null_blk.txt
- Null block for block-layer benchmarking.
queue-sysfs.txt
- Queue's sysfs entries
request.txt
- The members of struct request (in include/linux/blkdev.h)
stat.txt
- Block layer statistics in /sys/block/<device>/stat
switching-sched.txt
- Switching I/O schedulers at runtime
writeback_cache_control.txt
- Control of volatile write back caches

View File

@ -1,18 +0,0 @@
00-INDEX
- this file
README.DAC960
- info on Mylex DAC960/DAC1100 PCI RAID Controller Driver for Linux.
cciss.txt
- info, major/minor #'s for Compaq's SMART Array Controllers.
cpqarray.txt
- info on using Compaq's SMART2 Intelligent Disk Array Controllers.
floppy.txt
- notes and driver options for the floppy disk driver.
mflash.txt
- info on mGine m(g)flash driver for linux.
nbd.txt
- info on a TCP implementation of a network block device.
paride.txt
- information about the parallel port IDE subsystem.
ramdisk.txt
- short guide on how to set up and use the RAM disk.

View File

@ -1,756 +0,0 @@
Linux Driver for Mylex DAC960/AcceleRAID/eXtremeRAID PCI RAID Controllers
Version 2.2.11 for Linux 2.2.19
Version 2.4.11 for Linux 2.4.12
PRODUCTION RELEASE
11 October 2001
Leonard N. Zubkoff
Dandelion Digital
lnz@dandelion.com
Copyright 1998-2001 by Leonard N. Zubkoff <lnz@dandelion.com>
INTRODUCTION
Mylex, Inc. designs and manufactures a variety of high performance PCI RAID
controllers. Mylex Corporation is located at 34551 Ardenwood Blvd., Fremont,
California 94555, USA and can be reached at 510.796.6100 or on the World Wide
Web at http://www.mylex.com. Mylex Technical Support can be reached by
electronic mail at mylexsup@us.ibm.com, by voice at 510.608.2400, or by FAX at
510.745.7715. Contact information for offices in Europe and Japan is available
on their Web site.
The latest information on Linux support for DAC960 PCI RAID Controllers, as
well as the most recent release of this driver, will always be available from
my Linux Home Page at URL "http://www.dandelion.com/Linux/". The Linux DAC960
driver supports all current Mylex PCI RAID controllers including the new
eXtremeRAID 2000/3000 and AcceleRAID 352/170/160 models which have an entirely
new firmware interface from the older eXtremeRAID 1100, AcceleRAID 150/200/250,
and DAC960PJ/PG/PU/PD/PL. See below for a complete controller list as well as
minimum firmware version requirements. For simplicity, in most places this
documentation refers to DAC960 generically rather than explicitly listing all
the supported models.
Driver bug reports should be sent via electronic mail to "lnz@dandelion.com".
Please include with the bug report the complete configuration messages reported
by the driver at startup, along with any subsequent system messages relevant to
the controller's operation, and a detailed description of your system's
hardware configuration. Driver bugs are actually quite rare; if you encounter
problems with disks being marked offline, for example, please contact Mylex
Technical Support as the problem is related to the hardware configuration
rather than the Linux driver.
Please consult the RAID controller documentation for detailed information
regarding installation and configuration of the controllers. This document
primarily provides information specific to the Linux support.
DRIVER FEATURES
The DAC960 RAID controllers are supported solely as high performance RAID
controllers, not as interfaces to arbitrary SCSI devices. The Linux DAC960
driver operates at the block device level, the same level as the SCSI and IDE
drivers. Unlike other RAID controllers currently supported on Linux, the
DAC960 driver is not dependent on the SCSI subsystem, and hence avoids all the
complexity and unnecessary code that would be associated with an implementation
as a SCSI driver. The DAC960 driver is designed for as high a performance as
possible with no compromises or extra code for compatibility with lower
performance devices. The DAC960 driver includes extensive error logging and
online configuration management capabilities. Except for initial configuration
of the controller and adding new disk drives, most everything can be handled
from Linux while the system is operational.
The DAC960 driver is architected to support up to 8 controllers per system.
Each DAC960 parallel SCSI controller can support up to 15 disk drives per
channel, for a maximum of 60 drives on a four channel controller; the fibre
channel eXtremeRAID 3000 controller supports up to 125 disk drives per loop for
a total of 250 drives. The drives installed on a controller are divided into
one or more "Drive Groups", and then each Drive Group is subdivided further
into 1 to 32 "Logical Drives". Each Logical Drive has a specific RAID Level
and caching policy associated with it, and it appears to Linux as a single
block device. Logical Drives are further subdivided into up to 7 partitions
through the normal Linux and PC disk partitioning schemes. Logical Drives are
also known as "System Drives", and Drive Groups are also called "Packs". Both
terms are in use in the Mylex documentation; I have chosen to standardize on
the more generic "Logical Drive" and "Drive Group".
DAC960 RAID disk devices are named in the style of the obsolete Device File
System (DEVFS). The device corresponding to Logical Drive D on Controller C
is referred to as /dev/rd/cCdD, and the partitions are called /dev/rd/cCdDp1
through /dev/rd/cCdDp7. For example, partition 3 of Logical Drive 5 on
Controller 2 is referred to as /dev/rd/c2d5p3. Note that unlike with SCSI
disks the device names will not change in the event of a disk drive failure.
The DAC960 driver is assigned major numbers 48 - 55 with one major number per
controller. The 8 bits of minor number are divided into 5 bits for the Logical
Drive and 3 bits for the partition.
SUPPORTED DAC960/AcceleRAID/eXtremeRAID PCI RAID CONTROLLERS
The following list comprises the supported DAC960, AcceleRAID, and eXtremeRAID
PCI RAID Controllers as of the date of this document. It is recommended that
anyone purchasing a Mylex PCI RAID Controller not in the following table
contact the author beforehand to verify that it is or will be supported.
eXtremeRAID 3000
1 Wide Ultra-2/LVD SCSI channel
2 External Fibre FC-AL channels
233MHz StrongARM SA 110 Processor
64 Bit 33MHz PCI (backward compatible with 32 Bit PCI slots)
32MB/64MB ECC SDRAM Memory
eXtremeRAID 2000
4 Wide Ultra-160 LVD SCSI channels
233MHz StrongARM SA 110 Processor
64 Bit 33MHz PCI (backward compatible with 32 Bit PCI slots)
32MB/64MB ECC SDRAM Memory
AcceleRAID 352
2 Wide Ultra-160 LVD SCSI channels
100MHz Intel i960RN RISC Processor
64 Bit 33MHz PCI (backward compatible with 32 Bit PCI slots)
32MB/64MB ECC SDRAM Memory
AcceleRAID 170
1 Wide Ultra-160 LVD SCSI channel
100MHz Intel i960RM RISC Processor
16MB/32MB/64MB ECC SDRAM Memory
AcceleRAID 160 (AcceleRAID 170LP)
1 Wide Ultra-160 LVD SCSI channel
100MHz Intel i960RS RISC Processor
Built in 16M ECC SDRAM Memory
PCI Low Profile Form Factor - fit for 2U height
eXtremeRAID 1100 (DAC1164P)
3 Wide Ultra-2/LVD SCSI channels
233MHz StrongARM SA 110 Processor
64 Bit 33MHz PCI (backward compatible with 32 Bit PCI slots)
16MB/32MB/64MB Parity SDRAM Memory with Battery Backup
AcceleRAID 250 (DAC960PTL1)
Uses onboard Symbios SCSI chips on certain motherboards
Also includes one onboard Wide Ultra-2/LVD SCSI Channel
66MHz Intel i960RD RISC Processor
4MB/8MB/16MB/32MB/64MB/128MB ECC EDO Memory
AcceleRAID 200 (DAC960PTL0)
Uses onboard Symbios SCSI chips on certain motherboards
Includes no onboard SCSI Channels
66MHz Intel i960RD RISC Processor
4MB/8MB/16MB/32MB/64MB/128MB ECC EDO Memory
AcceleRAID 150 (DAC960PRL)
Uses onboard Symbios SCSI chips on certain motherboards
Also includes one onboard Wide Ultra-2/LVD SCSI Channel
33MHz Intel i960RP RISC Processor
4MB Parity EDO Memory
DAC960PJ 1/2/3 Wide Ultra SCSI-3 Channels
66MHz Intel i960RD RISC Processor
4MB/8MB/16MB/32MB/64MB/128MB ECC EDO Memory
DAC960PG 1/2/3 Wide Ultra SCSI-3 Channels
33MHz Intel i960RP RISC Processor
4MB/8MB ECC EDO Memory
DAC960PU 1/2/3 Wide Ultra SCSI-3 Channels
Intel i960CF RISC Processor
4MB/8MB EDRAM or 2MB/4MB/8MB/16MB/32MB DRAM Memory
DAC960PD 1/2/3 Wide Fast SCSI-2 Channels
Intel i960CF RISC Processor
4MB/8MB EDRAM or 2MB/4MB/8MB/16MB/32MB DRAM Memory
DAC960PL 1/2/3 Wide Fast SCSI-2 Channels
Intel i960 RISC Processor
2MB/4MB/8MB/16MB/32MB DRAM Memory
DAC960P 1/2/3 Wide Fast SCSI-2 Channels
Intel i960 RISC Processor
2MB/4MB/8MB/16MB/32MB DRAM Memory
For the eXtremeRAID 2000/3000 and AcceleRAID 352/170/160, firmware version
6.00-01 or above is required.
For the eXtremeRAID 1100, firmware version 5.06-0-52 or above is required.
For the AcceleRAID 250, 200, and 150, firmware version 4.06-0-57 or above is
required.
For the DAC960PJ and DAC960PG, firmware version 4.06-0-00 or above is required.
For the DAC960PU, DAC960PD, DAC960PL, and DAC960P, either firmware version
3.51-0-04 or above is required (for dual Flash ROM controllers), or firmware
version 2.73-0-00 or above is required (for single Flash ROM controllers)
Please note that not all SCSI disk drives are suitable for use with DAC960
controllers, and only particular firmware versions of any given model may
actually function correctly. Similarly, not all motherboards have a BIOS that
properly initializes the AcceleRAID 250, AcceleRAID 200, AcceleRAID 150,
DAC960PJ, and DAC960PG because the Intel i960RD/RP is a multi-function device.
If in doubt, contact Mylex RAID Technical Support (mylexsup@us.ibm.com) to
verify compatibility. Mylex makes available a hard disk compatibility list at
http://www.mylex.com/support/hdcomp/hd-lists.html.
DRIVER INSTALLATION
This distribution was prepared for Linux kernel version 2.2.19 or 2.4.12.
To install the DAC960 RAID driver, you may use the following commands,
replacing "/usr/src" with wherever you keep your Linux kernel source tree:
cd /usr/src
tar -xvzf DAC960-2.2.11.tar.gz (or DAC960-2.4.11.tar.gz)
mv README.DAC960 linux/Documentation
mv DAC960.[ch] linux/drivers/block
patch -p0 < DAC960.patch (if DAC960.patch is included)
cd linux
make config
make bzImage (or zImage)
Then install "arch/x86/boot/bzImage" or "arch/x86/boot/zImage" as your
standard kernel, run lilo if appropriate, and reboot.
To create the necessary devices in /dev, the "make_rd" script included in
"DAC960-Utilities.tar.gz" from http://www.dandelion.com/Linux/ may be used.
LILO 21 and FDISK v2.9 include DAC960 support; also included in this archive
are patches to LILO 20 and FDISK v2.8 that add DAC960 support, along with
statically linked executables of LILO and FDISK. This modified version of LILO
will allow booting from a DAC960 controller and/or mounting the root file
system from a DAC960.
Red Hat Linux 6.0 and SuSE Linux 6.1 include support for Mylex PCI RAID
controllers. Installing directly onto a DAC960 may be problematic from other
Linux distributions until their installation utilities are updated.
INSTALLATION NOTES
Before installing Linux or adding DAC960 logical drives to an existing Linux
system, the controller must first be configured to provide one or more logical
drives using the BIOS Configuration Utility or DACCF. Please note that since
there are only at most 6 usable partitions on each logical drive, systems
requiring more partitions should subdivide a drive group into multiple logical
drives, each of which can have up to 6 usable partitions. Also, note that with
large disk arrays it is advisable to enable the 8GB BIOS Geometry (255/63)
rather than accepting the default 2GB BIOS Geometry (128/32); failing to so do
will cause the logical drive geometry to have more than 65535 cylinders which
will make it impossible for FDISK to be used properly. The 8GB BIOS Geometry
can be enabled by configuring the DAC960 BIOS, which is accessible via Alt-M
during the BIOS initialization sequence.
For maximum performance and the most efficient E2FSCK performance, it is
recommended that EXT2 file systems be built with a 4KB block size and 16 block
stride to match the DAC960 controller's 64KB default stripe size. The command
"mke2fs -b 4096 -R stride=16 <device>" is appropriate. Unless there will be a
large number of small files on the file systems, it is also beneficial to add
the "-i 16384" option to increase the bytes per inode parameter thereby
reducing the file system metadata. Finally, on systems that will only be run
with Linux 2.2 or later kernels it is beneficial to enable sparse superblocks
with the "-s 1" option.
DAC960 ANNOUNCEMENTS MAILING LIST
The DAC960 Announcements Mailing List provides a forum for informing Linux
users of new driver releases and other announcements regarding Linux support
for DAC960 PCI RAID Controllers. To join the mailing list, send a message to
"dac960-announce-request@dandelion.com" with the line "subscribe" in the
message body.
CONTROLLER CONFIGURATION AND STATUS MONITORING
The DAC960 RAID controllers running firmware 4.06 or above include a Background
Initialization facility so that system downtime is minimized both for initial
installation and subsequent configuration of additional storage. The BIOS
Configuration Utility (accessible via Alt-R during the BIOS initialization
sequence) is used to quickly configure the controller, and then the logical
drives that have been created are available for immediate use even while they
are still being initialized by the controller. The primary need for online
configuration and status monitoring is then to avoid system downtime when disk
drives fail and must be replaced. Mylex's online monitoring and configuration
utilities are being ported to Linux and will become available at some point in
the future. Note that with a SAF-TE (SCSI Accessed Fault-Tolerant Enclosure)
enclosure, the controller is able to rebuild failed drives automatically as
soon as a drive replacement is made available.
The primary interfaces for controller configuration and status monitoring are
special files created in the /proc/rd/... hierarchy along with the normal
system console logging mechanism. Whenever the system is operating, the DAC960
driver queries each controller for status information every 10 seconds, and
checks for additional conditions every 60 seconds. The initial status of each
controller is always available for controller N in /proc/rd/cN/initial_status,
and the current status as of the last status monitoring query is available in
/proc/rd/cN/current_status. In addition, status changes are also logged by the
driver to the system console and will appear in the log files maintained by
syslog. The progress of asynchronous rebuild or consistency check operations
is also available in /proc/rd/cN/current_status, and progress messages are
logged to the system console at most every 60 seconds.
Starting with the 2.2.3/2.0.3 versions of the driver, the status information
available in /proc/rd/cN/initial_status and /proc/rd/cN/current_status has been
augmented to include the vendor, model, revision, and serial number (if
available) for each physical device found connected to the controller:
***** DAC960 RAID Driver Version 2.2.3 of 19 August 1999 *****
Copyright 1998-1999 by Leonard N. Zubkoff <lnz@dandelion.com>
Configuring Mylex DAC960PRL PCI RAID Controller
Firmware Version: 4.07-0-07, Channels: 1, Memory Size: 16MB
PCI Bus: 1, Device: 4, Function: 1, I/O Address: Unassigned
PCI Address: 0xFE300000 mapped at 0xA0800000, IRQ Channel: 21
Controller Queue Depth: 128, Maximum Blocks per Command: 128
Driver Queue Depth: 127, Maximum Scatter/Gather Segments: 33
Stripe Size: 64KB, Segment Size: 8KB, BIOS Geometry: 255/63
SAF-TE Enclosure Management Enabled
Physical Devices:
0:0 Vendor: IBM Model: DRVS09D Revision: 0270
Serial Number: 68016775HA
Disk Status: Online, 17928192 blocks
0:1 Vendor: IBM Model: DRVS09D Revision: 0270
Serial Number: 68004E53HA
Disk Status: Online, 17928192 blocks
0:2 Vendor: IBM Model: DRVS09D Revision: 0270
Serial Number: 13013935HA
Disk Status: Online, 17928192 blocks
0:3 Vendor: IBM Model: DRVS09D Revision: 0270
Serial Number: 13016897HA
Disk Status: Online, 17928192 blocks
0:4 Vendor: IBM Model: DRVS09D Revision: 0270
Serial Number: 68019905HA
Disk Status: Online, 17928192 blocks
0:5 Vendor: IBM Model: DRVS09D Revision: 0270
Serial Number: 68012753HA
Disk Status: Online, 17928192 blocks
0:6 Vendor: ESG-SHV Model: SCA HSBP M6 Revision: 0.61
Logical Drives:
/dev/rd/c0d0: RAID-5, Online, 89640960 blocks, Write Thru
No Rebuild or Consistency Check in Progress
To simplify the monitoring process for custom software, the special file
/proc/rd/status returns "OK" when all DAC960 controllers in the system are
operating normally and no failures have occurred, or "ALERT" if any logical
drives are offline or critical or any non-standby physical drives are dead.
Configuration commands for controller N are available via the special file
/proc/rd/cN/user_command. A human readable command can be written to this
special file to initiate a configuration operation, and the results of the
operation can then be read back from the special file in addition to being
logged to the system console. The shell command sequence
echo "<configuration-command>" > /proc/rd/c0/user_command
cat /proc/rd/c0/user_command
is typically used to execute configuration commands. The configuration
commands are:
flush-cache
The "flush-cache" command flushes the controller's cache. The system
automatically flushes the cache at shutdown or if the driver module is
unloaded, so this command is only needed to be certain a write back cache
is flushed to disk before the system is powered off by a command to a UPS.
Note that the flush-cache command also stops an asynchronous rebuild or
consistency check, so it should not be used except when the system is being
halted.
kill <channel>:<target-id>
The "kill" command marks the physical drive <channel>:<target-id> as DEAD.
This command is provided primarily for testing, and should not be used
during normal system operation.
make-online <channel>:<target-id>
The "make-online" command changes the physical drive <channel>:<target-id>
from status DEAD to status ONLINE. In cases where multiple physical drives
have been killed simultaneously, this command may be used to bring all but
one of them back online, after which a rebuild to the final drive is
necessary.
Warning: make-online should only be used on a dead physical drive that is
an active part of a drive group, never on a standby drive. The command
should never be used on a dead drive that is part of a critical logical
drive; rebuild should be used if only a single drive is dead.
make-standby <channel>:<target-id>
The "make-standby" command changes physical drive <channel>:<target-id>
from status DEAD to status STANDBY. It should only be used in cases where
a dead drive was replaced after an automatic rebuild was performed onto a
standby drive. It cannot be used to add a standby drive to the controller
configuration if one was not created initially; the BIOS Configuration
Utility must be used for that currently.
rebuild <channel>:<target-id>
The "rebuild" command initiates an asynchronous rebuild onto physical drive
<channel>:<target-id>. It should only be used when a dead drive has been
replaced.
check-consistency <logical-drive-number>
The "check-consistency" command initiates an asynchronous consistency check
of <logical-drive-number> with automatic restoration. It can be used
whenever it is desired to verify the consistency of the redundancy
information.
cancel-rebuild
cancel-consistency-check
The "cancel-rebuild" and "cancel-consistency-check" commands cancel any
rebuild or consistency check operations previously initiated.
EXAMPLE I - DRIVE FAILURE WITHOUT A STANDBY DRIVE
The following annotated logs demonstrate the controller configuration and and
online status monitoring capabilities of the Linux DAC960 Driver. The test
configuration comprises 6 1GB Quantum Atlas I disk drives on two channels of a
DAC960PJ controller. The physical drives are configured into a single drive
group without a standby drive, and the drive group has been configured into two
logical drives, one RAID-5 and one RAID-6. Note that these logs are from an
earlier version of the driver and the messages have changed somewhat with newer
releases, but the functionality remains similar. First, here is the current
status of the RAID configuration:
gwynedd:/u/lnz# cat /proc/rd/c0/current_status
***** DAC960 RAID Driver Version 2.0.0 of 23 March 1999 *****
Copyright 1998-1999 by Leonard N. Zubkoff <lnz@dandelion.com>
Configuring Mylex DAC960PJ PCI RAID Controller
Firmware Version: 4.06-0-08, Channels: 3, Memory Size: 8MB
PCI Bus: 0, Device: 19, Function: 1, I/O Address: Unassigned
PCI Address: 0xFD4FC000 mapped at 0x8807000, IRQ Channel: 9
Controller Queue Depth: 128, Maximum Blocks per Command: 128
Driver Queue Depth: 127, Maximum Scatter/Gather Segments: 33
Stripe Size: 64KB, Segment Size: 8KB, BIOS Geometry: 255/63
Physical Devices:
0:1 - Disk: Online, 2201600 blocks
0:2 - Disk: Online, 2201600 blocks
0:3 - Disk: Online, 2201600 blocks
1:1 - Disk: Online, 2201600 blocks
1:2 - Disk: Online, 2201600 blocks
1:3 - Disk: Online, 2201600 blocks
Logical Drives:
/dev/rd/c0d0: RAID-5, Online, 5498880 blocks, Write Thru
/dev/rd/c0d1: RAID-6, Online, 3305472 blocks, Write Thru
No Rebuild or Consistency Check in Progress
gwynedd:/u/lnz# cat /proc/rd/status
OK
The above messages indicate that everything is healthy, and /proc/rd/status
returns "OK" indicating that there are no problems with any DAC960 controller
in the system. For demonstration purposes, while I/O is active Physical Drive
1:1 is now disconnected, simulating a drive failure. The failure is noted by
the driver within 10 seconds of the controller's having detected it, and the
driver logs the following console status messages indicating that Logical
Drives 0 and 1 are now CRITICAL as a result of Physical Drive 1:1 being DEAD:
DAC960#0: Physical Drive 1:2 Error Log: Sense Key = 6, ASC = 29, ASCQ = 02
DAC960#0: Physical Drive 1:3 Error Log: Sense Key = 6, ASC = 29, ASCQ = 02
DAC960#0: Physical Drive 1:1 killed because of timeout on SCSI command
DAC960#0: Physical Drive 1:1 is now DEAD
DAC960#0: Logical Drive 0 (/dev/rd/c0d0) is now CRITICAL
DAC960#0: Logical Drive 1 (/dev/rd/c0d1) is now CRITICAL
The Sense Keys logged here are just Check Condition / Unit Attention conditions
arising from a SCSI bus reset that is forced by the controller during its error
recovery procedures. Concurrently with the above, the driver status available
from /proc/rd also reflects the drive failure. The status message in
/proc/rd/status has changed from "OK" to "ALERT":
gwynedd:/u/lnz# cat /proc/rd/status
ALERT
and /proc/rd/c0/current_status has been updated:
gwynedd:/u/lnz# cat /proc/rd/c0/current_status
...
Physical Devices:
0:1 - Disk: Online, 2201600 blocks
0:2 - Disk: Online, 2201600 blocks
0:3 - Disk: Online, 2201600 blocks
1:1 - Disk: Dead, 2201600 blocks
1:2 - Disk: Online, 2201600 blocks
1:3 - Disk: Online, 2201600 blocks
Logical Drives:
/dev/rd/c0d0: RAID-5, Critical, 5498880 blocks, Write Thru
/dev/rd/c0d1: RAID-6, Critical, 3305472 blocks, Write Thru
No Rebuild or Consistency Check in Progress
Since there are no standby drives configured, the system can continue to access
the logical drives in a performance degraded mode until the failed drive is
replaced and a rebuild operation completed to restore the redundancy of the
logical drives. Once Physical Drive 1:1 is replaced with a properly
functioning drive, or if the physical drive was killed without having failed
(e.g., due to electrical problems on the SCSI bus), the user can instruct the
controller to initiate a rebuild operation onto the newly replaced drive:
gwynedd:/u/lnz# echo "rebuild 1:1" > /proc/rd/c0/user_command
gwynedd:/u/lnz# cat /proc/rd/c0/user_command
Rebuild of Physical Drive 1:1 Initiated
The echo command instructs the controller to initiate an asynchronous rebuild
operation onto Physical Drive 1:1, and the status message that results from the
operation is then available for reading from /proc/rd/c0/user_command, as well
as being logged to the console by the driver.
Within 10 seconds of this command the driver logs the initiation of the
asynchronous rebuild operation:
DAC960#0: Rebuild of Physical Drive 1:1 Initiated
DAC960#0: Physical Drive 1:1 Error Log: Sense Key = 6, ASC = 29, ASCQ = 01
DAC960#0: Physical Drive 1:1 is now WRITE-ONLY
DAC960#0: Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 1% completed
and /proc/rd/c0/current_status is updated:
gwynedd:/u/lnz# cat /proc/rd/c0/current_status
...
Physical Devices:
0:1 - Disk: Online, 2201600 blocks
0:2 - Disk: Online, 2201600 blocks
0:3 - Disk: Online, 2201600 blocks
1:1 - Disk: Write-Only, 2201600 blocks
1:2 - Disk: Online, 2201600 blocks
1:3 - Disk: Online, 2201600 blocks
Logical Drives:
/dev/rd/c0d0: RAID-5, Critical, 5498880 blocks, Write Thru
/dev/rd/c0d1: RAID-6, Critical, 3305472 blocks, Write Thru
Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 6% completed
As the rebuild progresses, the current status in /proc/rd/c0/current_status is
updated every 10 seconds:
gwynedd:/u/lnz# cat /proc/rd/c0/current_status
...
Physical Devices:
0:1 - Disk: Online, 2201600 blocks
0:2 - Disk: Online, 2201600 blocks
0:3 - Disk: Online, 2201600 blocks
1:1 - Disk: Write-Only, 2201600 blocks
1:2 - Disk: Online, 2201600 blocks
1:3 - Disk: Online, 2201600 blocks
Logical Drives:
/dev/rd/c0d0: RAID-5, Critical, 5498880 blocks, Write Thru
/dev/rd/c0d1: RAID-6, Critical, 3305472 blocks, Write Thru
Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 15% completed
and every minute a progress message is logged to the console by the driver:
DAC960#0: Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 32% completed
DAC960#0: Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 63% completed
DAC960#0: Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 94% completed
DAC960#0: Rebuild in Progress: Logical Drive 1 (/dev/rd/c0d1) 94% completed
Finally, the rebuild completes successfully. The driver logs the status of the
logical and physical drives and the rebuild completion:
DAC960#0: Rebuild Completed Successfully
DAC960#0: Physical Drive 1:1 is now ONLINE
DAC960#0: Logical Drive 0 (/dev/rd/c0d0) is now ONLINE
DAC960#0: Logical Drive 1 (/dev/rd/c0d1) is now ONLINE
/proc/rd/c0/current_status is updated:
gwynedd:/u/lnz# cat /proc/rd/c0/current_status
...
Physical Devices:
0:1 - Disk: Online, 2201600 blocks
0:2 - Disk: Online, 2201600 blocks
0:3 - Disk: Online, 2201600 blocks
1:1 - Disk: Online, 2201600 blocks
1:2 - Disk: Online, 2201600 blocks
1:3 - Disk: Online, 2201600 blocks
Logical Drives:
/dev/rd/c0d0: RAID-5, Online, 5498880 blocks, Write Thru
/dev/rd/c0d1: RAID-6, Online, 3305472 blocks, Write Thru
Rebuild Completed Successfully
and /proc/rd/status indicates that everything is healthy once again:
gwynedd:/u/lnz# cat /proc/rd/status
OK
EXAMPLE II - DRIVE FAILURE WITH A STANDBY DRIVE
The following annotated logs demonstrate the controller configuration and and
online status monitoring capabilities of the Linux DAC960 Driver. The test
configuration comprises 6 1GB Quantum Atlas I disk drives on two channels of a
DAC960PJ controller. The physical drives are configured into a single drive
group with a standby drive, and the drive group has been configured into two
logical drives, one RAID-5 and one RAID-6. Note that these logs are from an
earlier version of the driver and the messages have changed somewhat with newer
releases, but the functionality remains similar. First, here is the current
status of the RAID configuration:
gwynedd:/u/lnz# cat /proc/rd/c0/current_status
***** DAC960 RAID Driver Version 2.0.0 of 23 March 1999 *****
Copyright 1998-1999 by Leonard N. Zubkoff <lnz@dandelion.com>
Configuring Mylex DAC960PJ PCI RAID Controller
Firmware Version: 4.06-0-08, Channels: 3, Memory Size: 8MB
PCI Bus: 0, Device: 19, Function: 1, I/O Address: Unassigned
PCI Address: 0xFD4FC000 mapped at 0x8807000, IRQ Channel: 9
Controller Queue Depth: 128, Maximum Blocks per Command: 128
Driver Queue Depth: 127, Maximum Scatter/Gather Segments: 33
Stripe Size: 64KB, Segment Size: 8KB, BIOS Geometry: 255/63
Physical Devices:
0:1 - Disk: Online, 2201600 blocks
0:2 - Disk: Online, 2201600 blocks
0:3 - Disk: Online, 2201600 blocks
1:1 - Disk: Online, 2201600 blocks
1:2 - Disk: Online, 2201600 blocks
1:3 - Disk: Standby, 2201600 blocks
Logical Drives:
/dev/rd/c0d0: RAID-5, Online, 4399104 blocks, Write Thru
/dev/rd/c0d1: RAID-6, Online, 2754560 blocks, Write Thru
No Rebuild or Consistency Check in Progress
gwynedd:/u/lnz# cat /proc/rd/status
OK
The above messages indicate that everything is healthy, and /proc/rd/status
returns "OK" indicating that there are no problems with any DAC960 controller
in the system. For demonstration purposes, while I/O is active Physical Drive
1:2 is now disconnected, simulating a drive failure. The failure is noted by
the driver within 10 seconds of the controller's having detected it, and the
driver logs the following console status messages:
DAC960#0: Physical Drive 1:1 Error Log: Sense Key = 6, ASC = 29, ASCQ = 02
DAC960#0: Physical Drive 1:3 Error Log: Sense Key = 6, ASC = 29, ASCQ = 02
DAC960#0: Physical Drive 1:2 killed because of timeout on SCSI command
DAC960#0: Physical Drive 1:2 is now DEAD
DAC960#0: Physical Drive 1:2 killed because it was removed
DAC960#0: Logical Drive 0 (/dev/rd/c0d0) is now CRITICAL
DAC960#0: Logical Drive 1 (/dev/rd/c0d1) is now CRITICAL
Since a standby drive is configured, the controller automatically begins
rebuilding onto the standby drive:
DAC960#0: Physical Drive 1:3 is now WRITE-ONLY
DAC960#0: Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 4% completed
Concurrently with the above, the driver status available from /proc/rd also
reflects the drive failure and automatic rebuild. The status message in
/proc/rd/status has changed from "OK" to "ALERT":
gwynedd:/u/lnz# cat /proc/rd/status
ALERT
and /proc/rd/c0/current_status has been updated:
gwynedd:/u/lnz# cat /proc/rd/c0/current_status
...
Physical Devices:
0:1 - Disk: Online, 2201600 blocks
0:2 - Disk: Online, 2201600 blocks
0:3 - Disk: Online, 2201600 blocks
1:1 - Disk: Online, 2201600 blocks
1:2 - Disk: Dead, 2201600 blocks
1:3 - Disk: Write-Only, 2201600 blocks
Logical Drives:
/dev/rd/c0d0: RAID-5, Critical, 4399104 blocks, Write Thru
/dev/rd/c0d1: RAID-6, Critical, 2754560 blocks, Write Thru
Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 4% completed
As the rebuild progresses, the current status in /proc/rd/c0/current_status is
updated every 10 seconds:
gwynedd:/u/lnz# cat /proc/rd/c0/current_status
...
Physical Devices:
0:1 - Disk: Online, 2201600 blocks
0:2 - Disk: Online, 2201600 blocks
0:3 - Disk: Online, 2201600 blocks
1:1 - Disk: Online, 2201600 blocks
1:2 - Disk: Dead, 2201600 blocks
1:3 - Disk: Write-Only, 2201600 blocks
Logical Drives:
/dev/rd/c0d0: RAID-5, Critical, 4399104 blocks, Write Thru
/dev/rd/c0d1: RAID-6, Critical, 2754560 blocks, Write Thru
Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 40% completed
and every minute a progress message is logged on the console by the driver:
DAC960#0: Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 40% completed
DAC960#0: Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 76% completed
DAC960#0: Rebuild in Progress: Logical Drive 1 (/dev/rd/c0d1) 66% completed
DAC960#0: Rebuild in Progress: Logical Drive 1 (/dev/rd/c0d1) 84% completed
Finally, the rebuild completes successfully. The driver logs the status of the
logical and physical drives and the rebuild completion:
DAC960#0: Rebuild Completed Successfully
DAC960#0: Physical Drive 1:3 is now ONLINE
DAC960#0: Logical Drive 0 (/dev/rd/c0d0) is now ONLINE
DAC960#0: Logical Drive 1 (/dev/rd/c0d1) is now ONLINE
/proc/rd/c0/current_status is updated:
***** DAC960 RAID Driver Version 2.0.0 of 23 March 1999 *****
Copyright 1998-1999 by Leonard N. Zubkoff <lnz@dandelion.com>
Configuring Mylex DAC960PJ PCI RAID Controller
Firmware Version: 4.06-0-08, Channels: 3, Memory Size: 8MB
PCI Bus: 0, Device: 19, Function: 1, I/O Address: Unassigned
PCI Address: 0xFD4FC000 mapped at 0x8807000, IRQ Channel: 9
Controller Queue Depth: 128, Maximum Blocks per Command: 128
Driver Queue Depth: 127, Maximum Scatter/Gather Segments: 33
Stripe Size: 64KB, Segment Size: 8KB, BIOS Geometry: 255/63
Physical Devices:
0:1 - Disk: Online, 2201600 blocks
0:2 - Disk: Online, 2201600 blocks
0:3 - Disk: Online, 2201600 blocks
1:1 - Disk: Online, 2201600 blocks
1:2 - Disk: Dead, 2201600 blocks
1:3 - Disk: Online, 2201600 blocks
Logical Drives:
/dev/rd/c0d0: RAID-5, Online, 4399104 blocks, Write Thru
/dev/rd/c0d1: RAID-6, Online, 2754560 blocks, Write Thru
Rebuild Completed Successfully
and /proc/rd/status indicates that everything is healthy once again:
gwynedd:/u/lnz# cat /proc/rd/status
OK
Note that the absence of a viable standby drive does not create an "ALERT"
status. Once dead Physical Drive 1:2 has been replaced, the controller must be
told that this has occurred and that the newly replaced drive should become the
new standby drive:
gwynedd:/u/lnz# echo "make-standby 1:2" > /proc/rd/c0/user_command
gwynedd:/u/lnz# cat /proc/rd/c0/user_command
Make Standby of Physical Drive 1:2 Succeeded
The echo command instructs the controller to make Physical Drive 1:2 into a
standby drive, and the status message that results from the operation is then
available for reading from /proc/rd/c0/user_command, as well as being logged to
the console by the driver. Within 60 seconds of this command the driver logs:
DAC960#0: Physical Drive 1:2 Error Log: Sense Key = 6, ASC = 29, ASCQ = 01
DAC960#0: Physical Drive 1:2 is now STANDBY
DAC960#0: Make Standby of Physical Drive 1:2 Succeeded
and /proc/rd/c0/current_status is updated:
gwynedd:/u/lnz# cat /proc/rd/c0/current_status
...
Physical Devices:
0:1 - Disk: Online, 2201600 blocks
0:2 - Disk: Online, 2201600 blocks
0:3 - Disk: Online, 2201600 blocks
1:1 - Disk: Online, 2201600 blocks
1:2 - Disk: Standby, 2201600 blocks
1:3 - Disk: Online, 2201600 blocks
Logical Drives:
/dev/rd/c0d0: RAID-5, Online, 4399104 blocks, Write Thru
/dev/rd/c0d1: RAID-6, Online, 2754560 blocks, Write Thru
Rebuild Completed Successfully

View File

@ -190,7 +190,7 @@ whitespace:
notify_free Depending on device usage scenario it may account
a) the number of pages freed because of swap slot free
notifications or b) the number of pages freed because of
REQ_DISCARD requests sent by bio. The former ones are
REQ_OP_DISCARD requests sent by bio. The former ones are
sent to a swap block device when a swap slot is freed,
which implies that this disk is being used as a swap disk.
The latter ones are sent by filesystem mounted with

View File

@ -1,11 +0,0 @@
00-INDEX
- this file (info on CD-ROMs and Linux)
Makefile
- only used to generate TeX output from the documentation.
cdrom-standard.tex
- LaTeX document on standardizing the CD-ROM programming interface.
ide-cd
- info on setting up and using ATAPI (aka IDE) CD-ROMs.
packet-writing.txt
- Info on the CDRW packet writing module

View File

@ -1,26 +0,0 @@
00-INDEX
- this file
blkio-controller.txt
- Description for Block IO Controller, implementation and usage details.
cgroups.txt
- Control Groups definition, implementation details, examples and API.
cpuacct.txt
- CPU Accounting Controller; account CPU usage for groups of tasks.
cpusets.txt
- documents the cpusets feature; assign CPUs and Mem to a set of tasks.
admin-guide/devices.rst
- Device Whitelist Controller; description, interface and security.
freezer-subsystem.txt
- checkpointing; rationale to not use signals, interface.
hugetlb.txt
- HugeTLB Controller implementation and usage details.
memcg_test.txt
- Memory Resource Controller; implementation details.
memory.txt
- Memory Resource Controller; design, accounting, interface, testing.
net_cls.txt
- Network classifier cgroups details and usages.
net_prio.txt
- Network priority cgroups details and usages.
pids.txt
- Process number cgroups details and usages.

View File

@ -27,7 +27,7 @@ cgroup.
Currently user space applications can easily take away all the rdma verb
specific resources such as AH, CQ, QP, MR etc. Due to which other applications
in other cgroup or kernel space ULPs may not even get chance to allocate any
rdma resources. This can leads to service unavailability.
rdma resources. This can lead to service unavailability.
Therefore RDMA controller is needed through which resource consumption
of processes can be limited. Through this controller different rdma

View File

@ -259,7 +259,7 @@ latex_elements = {
'papersize': 'a4paper',
# The font size ('10pt', '11pt' or '12pt').
'pointsize': '8pt',
'pointsize': '11pt',
# Latex figure (float) alignment
#'figure_align': 'htbp',
@ -272,8 +272,8 @@ latex_elements = {
'preamble': '''
% Use some font with UTF-8 support with XeLaTeX
\\usepackage{fontspec}
\\setsansfont{DejaVu Serif}
\\setromanfont{DejaVu Sans}
\\setsansfont{DejaVu Sans}
\\setromanfont{DejaVu Serif}
\\setmonofont{DejaVu Sans Mono}
'''
@ -383,6 +383,10 @@ latex_documents = [
'The kernel development community', 'manual'),
('filesystems/index', 'filesystems.tex', 'Linux Filesystems API',
'The kernel development community', 'manual'),
('admin-guide/ext4', 'ext4-admin-guide.tex', 'ext4 Administration Guide',
'ext4 Community', 'manual'),
('filesystems/ext4/index', 'ext4-data-structures.tex',
'ext4 Data Structures and Algorithms', 'ext4 Community', 'manual'),
('gpu/index', 'gpu.tex', 'Linux GPU Driver Developer\'s Guide',
'The kernel development community', 'manual'),
('input/index', 'linux-input.tex', 'The Linux input driver subsystem',

View File

@ -5,54 +5,23 @@ Boot time memory management
Early system initialization cannot use "normal" memory management
simply because it is not set up yet. But there is still need to
allocate memory for various data structures, for instance for the
physical page allocator. To address this, a specialized allocator
called the :ref:`Boot Memory Allocator <bootmem>`, or bootmem, was
introduced. Several years later PowerPC developers added a "Logical
Memory Blocks" allocator, which was later adopted by other
architectures and renamed to :ref:`memblock <memblock>`. There is also
a compatibility layer called `nobootmem` that translates bootmem
allocation interfaces to memblock calls.
physical page allocator.
The selection of the early allocator is done using
``CONFIG_NO_BOOTMEM`` and ``CONFIG_HAVE_MEMBLOCK`` kernel
configuration options. These options are enabled or disabled
statically by the architectures' Kconfig files.
* Architectures that rely only on bootmem select
``CONFIG_NO_BOOTMEM=n && CONFIG_HAVE_MEMBLOCK=n``.
* The users of memblock with the nobootmem compatibility layer set
``CONFIG_NO_BOOTMEM=y && CONFIG_HAVE_MEMBLOCK=y``.
* And for those that use both memblock and bootmem the configuration
includes ``CONFIG_NO_BOOTMEM=n && CONFIG_HAVE_MEMBLOCK=y``.
Whichever allocator is used, it is the responsibility of the
architecture specific initialization to set it up in
:c:func:`setup_arch` and tear it down in :c:func:`mem_init` functions.
A specialized allocator called ``memblock`` performs the
boot time memory management. The architecture specific initialization
must set it up in :c:func:`setup_arch` and tear it down in
:c:func:`mem_init` functions.
Once the early memory management is available it offers a variety of
functions and macros for memory allocations. The allocation request
may be directed to the first (and probably the only) node or to a
particular node in a NUMA system. There are API variants that panic
when an allocation fails and those that don't. And more recent and
advanced memblock even allows controlling its own behaviour.
when an allocation fails and those that don't.
.. _bootmem:
Memblock also offers a variety of APIs that control its own behaviour.
Bootmem
=======
(mostly stolen from Mel Gorman's "Understanding the Linux Virtual
Memory Manager" `book`_)
.. _book: https://www.kernel.org/doc/gorman/
.. kernel-doc:: mm/bootmem.c
:doc: bootmem overview
.. _memblock:
Memblock
========
Memblock Overview
=================
.. kernel-doc:: mm/memblock.c
:doc: memblock overview
@ -61,26 +30,6 @@ Memblock
Functions and structures
========================
Common API
----------
The functions that are described in this section are available
regardless of what early memory manager is enabled.
.. kernel-doc:: mm/nobootmem.c
Bootmem specific API
--------------------
These interfaces available only with bootmem, i.e when ``CONFIG_NO_BOOTMEM=n``
.. kernel-doc:: include/linux/bootmem.h
.. kernel-doc:: mm/bootmem.c
:nodocs:
Memblock specific API
---------------------
Here is the description of memblock data structures, functions and
macros. Some of them are actually internal, but since they are
documented it would be silly to omit them. Besides, reading the
@ -89,4 +38,4 @@ really happens under the hood.
.. kernel-doc:: include/linux/memblock.h
.. kernel-doc:: mm/memblock.c
:nodocs:
:functions:

View File

@ -1,3 +1,5 @@
.. _gfp_mask_from_fs_io:
=================================
GFP masks used from FS/IO context
=================================

View File

@ -1,4 +1,4 @@
.. SPDX-License-Identifier: CC-BY-SA-4.0
.. SPDX-License-Identifier: GPL-2.0+
=============
ID Allocation

View File

@ -21,16 +21,20 @@ Core utilities
local_ops
workqueue
genericirq
xarray
flexible-arrays
librs
genalloc
errseq
printk-formats
circular-buffers
memory-allocation
mm-api
gfp_mask-from-fs-io
timekeeping
boot-time-mm
memory-hotplug
Interfaces for kernel debugging
===============================

View File

@ -0,0 +1,122 @@
=======================
Memory Allocation Guide
=======================
Linux provides a variety of APIs for memory allocation. You can
allocate small chunks using `kmalloc` or `kmem_cache_alloc` families,
large virtually contiguous areas using `vmalloc` and its derivatives,
or you can directly request pages from the page allocator with
`alloc_pages`. It is also possible to use more specialized allocators,
for instance `cma_alloc` or `zs_malloc`.
Most of the memory allocation APIs use GFP flags to express how that
memory should be allocated. The GFP acronym stands for "get free
pages", the underlying memory allocation function.
Diversity of the allocation APIs combined with the numerous GFP flags
makes the question "How should I allocate memory?" not that easy to
answer, although very likely you should use
::
kzalloc(<size>, GFP_KERNEL);
Of course there are cases when other allocation APIs and different GFP
flags must be used.
Get Free Page flags
===================
The GFP flags control the allocators behavior. They tell what memory
zones can be used, how hard the allocator should try to find free
memory, whether the memory can be accessed by the userspace etc. The
:ref:`Documentation/core-api/mm-api.rst <mm-api-gfp-flags>` provides
reference documentation for the GFP flags and their combinations and
here we briefly outline their recommended usage:
* Most of the time ``GFP_KERNEL`` is what you need. Memory for the
kernel data structures, DMAable memory, inode cache, all these and
many other allocations types can use ``GFP_KERNEL``. Note, that
using ``GFP_KERNEL`` implies ``GFP_RECLAIM``, which means that
direct reclaim may be triggered under memory pressure; the calling
context must be allowed to sleep.
* If the allocation is performed from an atomic context, e.g interrupt
handler, use ``GFP_NOWAIT``. This flag prevents direct reclaim and
IO or filesystem operations. Consequently, under memory pressure
``GFP_NOWAIT`` allocation is likely to fail. Allocations which
have a reasonable fallback should be using ``GFP_NOWARN``.
* If you think that accessing memory reserves is justified and the kernel
will be stressed unless allocation succeeds, you may use ``GFP_ATOMIC``.
* Untrusted allocations triggered from userspace should be a subject
of kmem accounting and must have ``__GFP_ACCOUNT`` bit set. There
is the handy ``GFP_KERNEL_ACCOUNT`` shortcut for ``GFP_KERNEL``
allocations that should be accounted.
* Userspace allocations should use either of the ``GFP_USER``,
``GFP_HIGHUSER`` or ``GFP_HIGHUSER_MOVABLE`` flags. The longer
the flag name the less restrictive it is.
``GFP_HIGHUSER_MOVABLE`` does not require that allocated memory
will be directly accessible by the kernel and implies that the
data is movable.
``GFP_HIGHUSER`` means that the allocated memory is not movable,
but it is not required to be directly accessible by the kernel. An
example may be a hardware allocation that maps data directly into
userspace but has no addressing limitations.
``GFP_USER`` means that the allocated memory is not movable and it
must be directly accessible by the kernel.
You may notice that quite a few allocations in the existing code
specify ``GFP_NOIO`` or ``GFP_NOFS``. Historically, they were used to
prevent recursion deadlocks caused by direct memory reclaim calling
back into the FS or IO paths and blocking on already held
resources. Since 4.12 the preferred way to address this issue is to
use new scope APIs described in
:ref:`Documentation/core-api/gfp_mask-from-fs-io.rst <gfp_mask_from_fs_io>`.
Other legacy GFP flags are ``GFP_DMA`` and ``GFP_DMA32``. They are
used to ensure that the allocated memory is accessible by hardware
with limited addressing capabilities. So unless you are writing a
driver for a device with such restrictions, avoid using these flags.
And even with hardware with restrictions it is preferable to use
`dma_alloc*` APIs.
Selecting memory allocator
==========================
The most straightforward way to allocate memory is to use a function
from the :c:func:`kmalloc` family. And, to be on the safe size it's
best to use routines that set memory to zero, like
:c:func:`kzalloc`. If you need to allocate memory for an array, there
are :c:func:`kmalloc_array` and :c:func:`kcalloc` helpers.
The maximal size of a chunk that can be allocated with `kmalloc` is
limited. The actual limit depends on the hardware and the kernel
configuration, but it is a good practice to use `kmalloc` for objects
smaller than page size.
For large allocations you can use :c:func:`vmalloc` and
:c:func:`vzalloc`, or directly request pages from the page
allocator. The memory allocated by `vmalloc` and related functions is
not physically contiguous.
If you are not sure whether the allocation size is too large for
`kmalloc`, it is possible to use :c:func:`kvmalloc` and its
derivatives. It will try to allocate memory with `kmalloc` and if the
allocation fails it will be retried with `vmalloc`. There are
restrictions on which GFP flags can be used with `kvmalloc`; please
see :c:func:`kvmalloc_node` reference documentation. Note that
`kvmalloc` may return memory that is not physically contiguous.
If you need to allocate many identical objects you can use the slab
cache allocator. The cache should be set up with
:c:func:`kmem_cache_create` before it can be used. Afterwards
:c:func:`kmem_cache_alloc` and its convenience wrappers can allocate
memory from that cache.
When the allocated memory is no longer needed it must be freed. You
can use :c:func:`kvfree` for the memory allocated with `kmalloc`,
`vmalloc` and `kvmalloc`. The slab caches should be freed with
:c:func:`kmem_cache_free`. And don't forget to destroy the cache with
:c:func:`kmem_cache_destroy`.

View File

@ -0,0 +1,125 @@
.. _memory_hotplug:
==============
Memory hotplug
==============
Memory hotplug event notifier
=============================
Hotplugging events are sent to a notification queue.
There are six types of notification defined in ``include/linux/memory.h``:
MEM_GOING_ONLINE
Generated before new memory becomes available in order to be able to
prepare subsystems to handle memory. The page allocator is still unable
to allocate from the new memory.
MEM_CANCEL_ONLINE
Generated if MEM_GOING_ONLINE fails.
MEM_ONLINE
Generated when memory has successfully brought online. The callback may
allocate pages from the new memory.
MEM_GOING_OFFLINE
Generated to begin the process of offlining memory. Allocations are no
longer possible from the memory but some of the memory to be offlined
is still in use. The callback can be used to free memory known to a
subsystem from the indicated memory block.
MEM_CANCEL_OFFLINE
Generated if MEM_GOING_OFFLINE fails. Memory is available again from
the memory block that we attempted to offline.
MEM_OFFLINE
Generated after offlining memory is complete.
A callback routine can be registered by calling::
hotplug_memory_notifier(callback_func, priority)
Callback functions with higher values of priority are called before callback
functions with lower values.
A callback function must have the following prototype::
int callback_func(
struct notifier_block *self, unsigned long action, void *arg);
The first argument of the callback function (self) is a pointer to the block
of the notifier chain that points to the callback function itself.
The second argument (action) is one of the event types described above.
The third argument (arg) passes a pointer of struct memory_notify::
struct memory_notify {
unsigned long start_pfn;
unsigned long nr_pages;
int status_change_nid_normal;
int status_change_nid_high;
int status_change_nid;
}
- start_pfn is start_pfn of online/offline memory.
- nr_pages is # of pages of online/offline memory.
- status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask
is (will be) set/clear, if this is -1, then nodemask status is not changed.
- status_change_nid_high is set node id when N_HIGH_MEMORY of nodemask
is (will be) set/clear, if this is -1, then nodemask status is not changed.
- status_change_nid is set node id when N_MEMORY of nodemask is (will be)
set/clear. It means a new(memoryless) node gets new memory by online and a
node loses all memory. If this is -1, then nodemask status is not changed.
If status_changed_nid* >= 0, callback should create/discard structures for the
node if necessary.
The callback routine shall return one of the values
NOTIFY_DONE, NOTIFY_OK, NOTIFY_BAD, NOTIFY_STOP
defined in ``include/linux/notifier.h``
NOTIFY_DONE and NOTIFY_OK have no effect on the further processing.
NOTIFY_BAD is used as response to the MEM_GOING_ONLINE, MEM_GOING_OFFLINE,
MEM_ONLINE, or MEM_OFFLINE action to cancel hotplugging. It stops
further processing of the notification queue.
NOTIFY_STOP stops further processing of the notification queue.
Locking Internals
=================
When adding/removing memory that uses memory block devices (i.e. ordinary RAM),
the device_hotplug_lock should be held to:
- synchronize against online/offline requests (e.g. via sysfs). This way, memory
block devices can only be accessed (.online/.state attributes) by user
space once memory has been fully added. And when removing memory, we
know nobody is in critical sections.
- synchronize against CPU hotplug and similar (e.g. relevant for ACPI and PPC)
Especially, there is a possible lock inversion that is avoided using
device_hotplug_lock when adding memory and user space tries to online that
memory faster than expected:
- device_online() will first take the device_lock(), followed by
mem_hotplug_lock
- add_memory_resource() will first take the mem_hotplug_lock, followed by
the device_lock() (while creating the devices, during bus_add_device()).
As the device is visible to user space before taking the device_lock(), this
can result in a lock inversion.
onlining/offlining of memory should be done via device_online()/
device_offline() - to make sure it is properly synchronized to actions
via sysfs. Holding device_hotplug_lock is advised (to e.g. protect online_type)
When adding/removing/onlining/offlining memory or adding/removing
heterogeneous/device memory, we should always hold the mem_hotplug_lock in
write mode to serialise memory hotplug (e.g. access to global/zone
variables).
In addition, mem_hotplug_lock (in contrast to device_hotplug_lock) in read
mode allows for a quite efficient get_online_mems/put_online_mems
implementation, so code accessing memory can protect from that memory
vanishing.

View File

@ -14,6 +14,8 @@ User Space Memory Access
.. kernel-doc:: mm/util.c
:functions: get_user_pages_fast
.. _mm-api-gfp-flags:
Memory Allocation Controls
==========================

View File

@ -376,15 +376,15 @@ correctness of the format string and va_list arguments.
Passed by reference.
kobjects
--------
Device tree nodes
-----------------
::
%pOF[fnpPcCF]
For printing kobject based structs (device nodes). Default behaviour is
For printing device tree node structures. Default behaviour is
equivalent to %pOFf.
- f - device node full_name
@ -420,9 +420,8 @@ struct clk
%pC pll1
%pCn pll1
For printing struct clk structures. %pC and %pCn print the name
(Common Clock Framework) or address (legacy clock framework) of the
structure.
For printing struct clk structures. %pC and %pCn print the name of the clock
(Common Clock Framework) or a unique 32-bit ID (legacy clock framework).
Passed by reference.

View File

@ -0,0 +1,435 @@
.. SPDX-License-Identifier: GPL-2.0+
======
XArray
======
:Author: Matthew Wilcox
Overview
========
The XArray is an abstract data type which behaves like a very large array
of pointers. It meets many of the same needs as a hash or a conventional
resizable array. Unlike a hash, it allows you to sensibly go to the
next or previous entry in a cache-efficient manner. In contrast to a
resizable array, there is no need to copy data or change MMU mappings in
order to grow the array. It is more memory-efficient, parallelisable
and cache friendly than a doubly-linked list. It takes advantage of
RCU to perform lookups without locking.
The XArray implementation is efficient when the indices used are densely
clustered; hashing the object and using the hash as the index will not
perform well. The XArray is optimised for small indices, but still has
good performance with large indices. If your index can be larger than
``ULONG_MAX`` then the XArray is not the data type for you. The most
important user of the XArray is the page cache.
Each non-``NULL`` entry in the array has three bits associated with
it called marks. Each mark may be set or cleared independently of
the others. You can iterate over entries which are marked.
Normal pointers may be stored in the XArray directly. They must be 4-byte
aligned, which is true for any pointer returned from :c:func:`kmalloc` and
:c:func:`alloc_page`. It isn't true for arbitrary user-space pointers,
nor for function pointers. You can store pointers to statically allocated
objects, as long as those objects have an alignment of at least 4.
You can also store integers between 0 and ``LONG_MAX`` in the XArray.
You must first convert it into an entry using :c:func:`xa_mk_value`.
When you retrieve an entry from the XArray, you can check whether it is
a value entry by calling :c:func:`xa_is_value`, and convert it back to
an integer by calling :c:func:`xa_to_value`.
Some users want to store tagged pointers instead of using the marks
described above. They can call :c:func:`xa_tag_pointer` to create an
entry with a tag, :c:func:`xa_untag_pointer` to turn a tagged entry
back into an untagged pointer and :c:func:`xa_pointer_tag` to retrieve
the tag of an entry. Tagged pointers use the same bits that are used
to distinguish value entries from normal pointers, so each user must
decide whether they want to store value entries or tagged pointers in
any particular XArray.
The XArray does not support storing :c:func:`IS_ERR` pointers as some
conflict with value entries or internal entries.
An unusual feature of the XArray is the ability to create entries which
occupy a range of indices. Once stored to, looking up any index in
the range will return the same entry as looking up any other index in
the range. Setting a mark on one index will set it on all of them.
Storing to any index will store to all of them. Multi-index entries can
be explicitly split into smaller entries, or storing ``NULL`` into any
entry will cause the XArray to forget about the range.
Normal API
==========
Start by initialising an XArray, either with :c:func:`DEFINE_XARRAY`
for statically allocated XArrays or :c:func:`xa_init` for dynamically
allocated ones. A freshly-initialised XArray contains a ``NULL``
pointer at every index.
You can then set entries using :c:func:`xa_store` and get entries
using :c:func:`xa_load`. xa_store will overwrite any entry with the
new entry and return the previous entry stored at that index. You can
use :c:func:`xa_erase` instead of calling :c:func:`xa_store` with a
``NULL`` entry. There is no difference between an entry that has never
been stored to and one that has most recently had ``NULL`` stored to it.
You can conditionally replace an entry at an index by using
:c:func:`xa_cmpxchg`. Like :c:func:`cmpxchg`, it will only succeed if
the entry at that index has the 'old' value. It also returns the entry
which was at that index; if it returns the same entry which was passed as
'old', then :c:func:`xa_cmpxchg` succeeded.
If you want to only store a new entry to an index if the current entry
at that index is ``NULL``, you can use :c:func:`xa_insert` which
returns ``-EEXIST`` if the entry is not empty.
You can enquire whether a mark is set on an entry by using
:c:func:`xa_get_mark`. If the entry is not ``NULL``, you can set a mark
on it by using :c:func:`xa_set_mark` and remove the mark from an entry by
calling :c:func:`xa_clear_mark`. You can ask whether any entry in the
XArray has a particular mark set by calling :c:func:`xa_marked`.
You can copy entries out of the XArray into a plain array by calling
:c:func:`xa_extract`. Or you can iterate over the present entries in
the XArray by calling :c:func:`xa_for_each`. You may prefer to use
:c:func:`xa_find` or :c:func:`xa_find_after` to move to the next present
entry in the XArray.
Calling :c:func:`xa_store_range` stores the same entry in a range
of indices. If you do this, some of the other operations will behave
in a slightly odd way. For example, marking the entry at one index
may result in the entry being marked at some, but not all of the other
indices. Storing into one index may result in the entry retrieved by
some, but not all of the other indices changing.
Finally, you can remove all entries from an XArray by calling
:c:func:`xa_destroy`. If the XArray entries are pointers, you may wish
to free the entries first. You can do this by iterating over all present
entries in the XArray using the :c:func:`xa_for_each` iterator.
ID assignment
-------------
You can call :c:func:`xa_alloc` to store the entry at any unused index
in the XArray. If you need to modify the array from interrupt context,
you can use :c:func:`xa_alloc_bh` or :c:func:`xa_alloc_irq` to disable
interrupts while allocating the ID. Unlike :c:func:`xa_store`, allocating
a ``NULL`` pointer does not delete an entry. Instead it reserves an
entry like :c:func:`xa_reserve` and you can release it using either
:c:func:`xa_erase` or :c:func:`xa_release`. To use ID assignment, the
XArray must be defined with :c:func:`DEFINE_XARRAY_ALLOC`, or initialised
by passing ``XA_FLAGS_ALLOC`` to :c:func:`xa_init_flags`,
Memory allocation
-----------------
The :c:func:`xa_store`, :c:func:`xa_cmpxchg`, :c:func:`xa_alloc`,
:c:func:`xa_reserve` and :c:func:`xa_insert` functions take a gfp_t
parameter in case the XArray needs to allocate memory to store this entry.
If the entry is being deleted, no memory allocation needs to be performed,
and the GFP flags specified will be ignored.
It is possible for no memory to be allocatable, particularly if you pass
a restrictive set of GFP flags. In that case, the functions return a
special value which can be turned into an errno using :c:func:`xa_err`.
If you don't need to know exactly which error occurred, using
:c:func:`xa_is_err` is slightly more efficient.
Locking
-------
When using the Normal API, you do not have to worry about locking.
The XArray uses RCU and an internal spinlock to synchronise access:
No lock needed:
* :c:func:`xa_empty`
* :c:func:`xa_marked`
Takes RCU read lock:
* :c:func:`xa_load`
* :c:func:`xa_for_each`
* :c:func:`xa_find`
* :c:func:`xa_find_after`
* :c:func:`xa_extract`
* :c:func:`xa_get_mark`
Takes xa_lock internally:
* :c:func:`xa_store`
* :c:func:`xa_insert`
* :c:func:`xa_erase`
* :c:func:`xa_erase_bh`
* :c:func:`xa_erase_irq`
* :c:func:`xa_cmpxchg`
* :c:func:`xa_store_range`
* :c:func:`xa_alloc`
* :c:func:`xa_alloc_bh`
* :c:func:`xa_alloc_irq`
* :c:func:`xa_destroy`
* :c:func:`xa_set_mark`
* :c:func:`xa_clear_mark`
Assumes xa_lock held on entry:
* :c:func:`__xa_store`
* :c:func:`__xa_insert`
* :c:func:`__xa_erase`
* :c:func:`__xa_cmpxchg`
* :c:func:`__xa_alloc`
* :c:func:`__xa_set_mark`
* :c:func:`__xa_clear_mark`
If you want to take advantage of the lock to protect the data structures
that you are storing in the XArray, you can call :c:func:`xa_lock`
before calling :c:func:`xa_load`, then take a reference count on the
object you have found before calling :c:func:`xa_unlock`. This will
prevent stores from removing the object from the array between looking
up the object and incrementing the refcount. You can also use RCU to
avoid dereferencing freed memory, but an explanation of that is beyond
the scope of this document.
The XArray does not disable interrupts or softirqs while modifying
the array. It is safe to read the XArray from interrupt or softirq
context as the RCU lock provides enough protection.
If, for example, you want to store entries in the XArray in process
context and then erase them in softirq context, you can do that this way::
void foo_init(struct foo *foo)
{
xa_init_flags(&foo->array, XA_FLAGS_LOCK_BH);
}
int foo_store(struct foo *foo, unsigned long index, void *entry)
{
int err;
xa_lock_bh(&foo->array);
err = xa_err(__xa_store(&foo->array, index, entry, GFP_KERNEL));
if (!err)
foo->count++;
xa_unlock_bh(&foo->array);
return err;
}
/* foo_erase() is only called from softirq context */
void foo_erase(struct foo *foo, unsigned long index)
{
xa_lock(&foo->array);
__xa_erase(&foo->array, index);
foo->count--;
xa_unlock(&foo->array);
}
If you are going to modify the XArray from interrupt or softirq context,
you need to initialise the array using :c:func:`xa_init_flags`, passing
``XA_FLAGS_LOCK_IRQ`` or ``XA_FLAGS_LOCK_BH``.
The above example also shows a common pattern of wanting to extend the
coverage of the xa_lock on the store side to protect some statistics
associated with the array.
Sharing the XArray with interrupt context is also possible, either
using :c:func:`xa_lock_irqsave` in both the interrupt handler and process
context, or :c:func:`xa_lock_irq` in process context and :c:func:`xa_lock`
in the interrupt handler. Some of the more common patterns have helper
functions such as :c:func:`xa_erase_bh` and :c:func:`xa_erase_irq`.
Sometimes you need to protect access to the XArray with a mutex because
that lock sits above another mutex in the locking hierarchy. That does
not entitle you to use functions like :c:func:`__xa_erase` without taking
the xa_lock; the xa_lock is used for lockdep validation and will be used
for other purposes in the future.
The :c:func:`__xa_set_mark` and :c:func:`__xa_clear_mark` functions are also
available for situations where you look up an entry and want to atomically
set or clear a mark. It may be more efficient to use the advanced API
in this case, as it will save you from walking the tree twice.
Advanced API
============
The advanced API offers more flexibility and better performance at the
cost of an interface which can be harder to use and has fewer safeguards.
No locking is done for you by the advanced API, and you are required
to use the xa_lock while modifying the array. You can choose whether
to use the xa_lock or the RCU lock while doing read-only operations on
the array. You can mix advanced and normal operations on the same array;
indeed the normal API is implemented in terms of the advanced API. The
advanced API is only available to modules with a GPL-compatible license.
The advanced API is based around the xa_state. This is an opaque data
structure which you declare on the stack using the :c:func:`XA_STATE`
macro. This macro initialises the xa_state ready to start walking
around the XArray. It is used as a cursor to maintain the position
in the XArray and let you compose various operations together without
having to restart from the top every time.
The xa_state is also used to store errors. You can call
:c:func:`xas_error` to retrieve the error. All operations check whether
the xa_state is in an error state before proceeding, so there's no need
for you to check for an error after each call; you can make multiple
calls in succession and only check at a convenient point. The only
errors currently generated by the XArray code itself are ``ENOMEM`` and
``EINVAL``, but it supports arbitrary errors in case you want to call
:c:func:`xas_set_err` yourself.
If the xa_state is holding an ``ENOMEM`` error, calling :c:func:`xas_nomem`
will attempt to allocate more memory using the specified gfp flags and
cache it in the xa_state for the next attempt. The idea is that you take
the xa_lock, attempt the operation and drop the lock. The operation
attempts to allocate memory while holding the lock, but it is more
likely to fail. Once you have dropped the lock, :c:func:`xas_nomem`
can try harder to allocate more memory. It will return ``true`` if it
is worth retrying the operation (i.e. that there was a memory error *and*
more memory was allocated). If it has previously allocated memory, and
that memory wasn't used, and there is no error (or some error that isn't
``ENOMEM``), then it will free the memory previously allocated.
Internal Entries
----------------
The XArray reserves some entries for its own purposes. These are never
exposed through the normal API, but when using the advanced API, it's
possible to see them. Usually the best way to handle them is to pass them
to :c:func:`xas_retry`, and retry the operation if it returns ``true``.
.. flat-table::
:widths: 1 1 6
* - Name
- Test
- Usage
* - Node
- :c:func:`xa_is_node`
- An XArray node. May be visible when using a multi-index xa_state.
* - Sibling
- :c:func:`xa_is_sibling`
- A non-canonical entry for a multi-index entry. The value indicates
which slot in this node has the canonical entry.
* - Retry
- :c:func:`xa_is_retry`
- This entry is currently being modified by a thread which has the
xa_lock. The node containing this entry may be freed at the end
of this RCU period. You should restart the lookup from the head
of the array.
* - Zero
- :c:func:`xa_is_zero`
- Zero entries appear as ``NULL`` through the Normal API, but occupy
an entry in the XArray which can be used to reserve the index for
future use.
Other internal entries may be added in the future. As far as possible, they
will be handled by :c:func:`xas_retry`.
Additional functionality
------------------------
The :c:func:`xas_create_range` function allocates all the necessary memory
to store every entry in a range. It will set ENOMEM in the xa_state if
it cannot allocate memory.
You can use :c:func:`xas_init_marks` to reset the marks on an entry
to their default state. This is usually all marks clear, unless the
XArray is marked with ``XA_FLAGS_TRACK_FREE``, in which case mark 0 is set
and all other marks are clear. Replacing one entry with another using
:c:func:`xas_store` will not reset the marks on that entry; if you want
the marks reset, you should do that explicitly.
The :c:func:`xas_load` will walk the xa_state as close to the entry
as it can. If you know the xa_state has already been walked to the
entry and need to check that the entry hasn't changed, you can use
:c:func:`xas_reload` to save a function call.
If you need to move to a different index in the XArray, call
:c:func:`xas_set`. This resets the cursor to the top of the tree, which
will generally make the next operation walk the cursor to the desired
spot in the tree. If you want to move to the next or previous index,
call :c:func:`xas_next` or :c:func:`xas_prev`. Setting the index does
not walk the cursor around the array so does not require a lock to be
held, while moving to the next or previous index does.
You can search for the next present entry using :c:func:`xas_find`. This
is the equivalent of both :c:func:`xa_find` and :c:func:`xa_find_after`;
if the cursor has been walked to an entry, then it will find the next
entry after the one currently referenced. If not, it will return the
entry at the index of the xa_state. Using :c:func:`xas_next_entry` to
move to the next present entry instead of :c:func:`xas_find` will save
a function call in the majority of cases at the expense of emitting more
inline code.
The :c:func:`xas_find_marked` function is similar. If the xa_state has
not been walked, it will return the entry at the index of the xa_state,
if it is marked. Otherwise, it will return the first marked entry after
the entry referenced by the xa_state. The :c:func:`xas_next_marked`
function is the equivalent of :c:func:`xas_next_entry`.
When iterating over a range of the XArray using :c:func:`xas_for_each`
or :c:func:`xas_for_each_marked`, it may be necessary to temporarily stop
the iteration. The :c:func:`xas_pause` function exists for this purpose.
After you have done the necessary work and wish to resume, the xa_state
is in an appropriate state to continue the iteration after the entry
you last processed. If you have interrupts disabled while iterating,
then it is good manners to pause the iteration and reenable interrupts
every ``XA_CHECK_SCHED`` entries.
The :c:func:`xas_get_mark`, :c:func:`xas_set_mark` and
:c:func:`xas_clear_mark` functions require the xa_state cursor to have
been moved to the appropriate location in the xarray; they will do
nothing if you have called :c:func:`xas_pause` or :c:func:`xas_set`
immediately before.
You can call :c:func:`xas_set_update` to have a callback function
called each time the XArray updates a node. This is used by the page
cache workingset code to maintain its list of nodes which contain only
shadow entries.
Multi-Index Entries
-------------------
The XArray has the ability to tie multiple indices together so that
operations on one index affect all indices. For example, storing into
any index will change the value of the entry retrieved from any index.
Setting or clearing a mark on any index will set or clear the mark
on every index that is tied together. The current implementation
only allows tying ranges which are aligned powers of two together;
eg indices 64-127 may be tied together, but 2-6 may not be. This may
save substantial quantities of memory; for example tying 512 entries
together will save over 4kB.
You can create a multi-index entry by using :c:func:`XA_STATE_ORDER`
or :c:func:`xas_set_order` followed by a call to :c:func:`xas_store`.
Calling :c:func:`xas_load` with a multi-index xa_state will walk the
xa_state to the right location in the tree, but the return value is not
meaningful, potentially being an internal entry or ``NULL`` even when there
is an entry stored within the range. Calling :c:func:`xas_find_conflict`
will return the first entry within the range or ``NULL`` if there are no
entries in the range. The :c:func:`xas_for_each_conflict` iterator will
iterate over every entry which overlaps the specified range.
If :c:func:`xas_load` encounters a multi-index entry, the xa_index
in the xa_state will not be changed. When iterating over an XArray
or calling :c:func:`xas_find`, if the initial index is in the middle
of a multi-index entry, it will not be altered. Subsequent calls
or iterations will move the index to the first index in the range.
Each entry will only be returned once, no matter how many indices it
occupies.
Using :c:func:`xas_next` or :c:func:`xas_prev` with a multi-index xa_state
is not supported. Using either of these functions on a multi-index entry
will reveal sibling entries; these should be skipped over by the caller.
Storing ``NULL`` into any index of a multi-index entry will set the entry
at every index to ``NULL`` and dissolve the tie. Splitting a multi-index
entry into entries occupying smaller ranges is not yet supported.
Functions and structures
========================
.. kernel-doc:: include/linux/xarray.h
.. kernel-doc:: lib/xarray.c

View File

@ -183,6 +183,10 @@ and looks like the following:
void (*describe)(const struct key *key, struct seq_file *m);
void (*destroy)(void *payload);
int (*query)(const struct kernel_pkey_params *params,
struct kernel_pkey_query *info);
int (*eds_op)(struct kernel_pkey_params *params,
const void *in, void *out);
int (*verify_signature)(const struct key *key,
const struct public_key_signature *sig);
};
@ -207,12 +211,22 @@ There are a number of operations defined by the subtype:
asymmetric key will look after freeing the fingerprint and releasing the
reference on the subtype module.
(3) verify_signature().
(3) query().
Optional. These are the entry points for the key usage operations.
Currently there is only the one defined. If not set, the caller will be
given -ENOTSUPP. The subtype may do anything it likes to implement an
operation, including offloading to hardware.
Mandatory. This is a function for querying the capabilities of a key.
(4) eds_op().
Optional. This is the entry point for the encryption, decryption and
signature creation operations (which are distinguished by the operation ID
in the parameter struct). The subtype may do anything it likes to
implement an operation, including offloading to hardware.
(5) verify_signature().
Optional. This is the entry point for signature verification. The
subtype may do anything it likes to implement an operation, including
offloading to hardware.
==========================
@ -234,6 +248,8 @@ Examples of blob formats for which parsers could be implemented include:
- X.509 ASN.1 stream.
- Pointer to TPM key.
- Pointer to UEFI key.
- PKCS#8 private key [RFC 5208].
- PKCS#5 encrypted private key [RFC 2898].
During key instantiation each parser in the list is tried until one doesn't
return -EBADMSG.

View File

@ -30,18 +30,29 @@ of many distributions, e.g. :
- NetBSD
- FreeBSD
You can get the latest version released from the Coccinelle homepage at
Some distribution packages are obsolete and it is recommended
to use the latest version released from the Coccinelle homepage at
http://coccinelle.lip6.fr/
Once you have it, run the following command::
Or from Github at:
./configure
https://github.com/coccinelle/coccinelle
Once you have it, run the following commands::
./autogen
./configure
make
as a regular user, and install it with::
sudo make install
More detailed installation instructions to build from source can be
found at:
https://github.com/coccinelle/coccinelle/blob/master/install.txt
Supplemental documentation
---------------------------
@ -51,6 +62,10 @@ https://bottest.wiki.kernel.org/coccicheck
The wiki documentation always refers to the linux-next version of the script.
For Semantic Patch Language(SmPL) grammar documentation refer to:
http://coccinelle.lip6.fr/documentation.php
Using Coccinelle on the Linux kernel
------------------------------------
@ -223,7 +238,7 @@ Since coccicheck runs through make, it naturally runs from the kernel
proper dir, as such the second rule above would be implied for picking up a
.cocciconfig when using ``make coccicheck``.
``make coccicheck`` also supports using M= targets.If you do not supply
``make coccicheck`` also supports using M= targets. If you do not supply
any M= target, it is assumed you want to target the entire kernel.
The kernel coccicheck script has::

View File

@ -159,7 +159,7 @@ Contributing new tests (details)
* If a test needs specific kernel config options enabled, add a config file in
the test directory to enable them.
e.g: tools/testing/selftests/android/ion/config
e.g: tools/testing/selftests/android/config
Test Harness
============

View File

@ -33,6 +33,10 @@ Optional feature parameters:
All write I/O is silently ignored.
Read I/O is handled correctly.
error_writes:
All write I/O is failed with an error signalled.
Read I/O is handled correctly.
corrupt_bio_byte <Nth_byte> <direction> <value> <flags>:
During <down interval>, replace <Nth_byte> of the data of
each matching bio with <value>.

View File

@ -38,7 +38,7 @@ inconsistent file system.
Any REQ_FUA requests bypass this flushing mechanism and are logged as soon as
they complete as those requests will obviously bypass the device cache.
Any REQ_DISCARD requests are treated like WRITE requests. Otherwise we would
Any REQ_OP_DISCARD requests are treated like WRITE requests. Otherwise we would
have all the DISCARD requests, and then the WRITE requests and then the FLUSH
request. Consider the following example:

View File

@ -1,12 +0,0 @@
Documentation for device trees, a data structure by which bootloaders pass
hardware layout to Linux in a device-independent manner, simplifying hardware
probing. This subsystem is maintained by Grant Likely
<grant.likely@secretlab.ca> and has a mailing list at
https://lists.ozlabs.org/listinfo/devicetree-discuss
00-INDEX
- this file
booting-without-of.txt
- Booting Linux without Open Firmware, describes history and format of device trees.
usage-model.txt
- How Linux uses DT and what DT aims to solve.

View File

@ -14,75 +14,3 @@ compatible: must contain "al,alpine"
...
}
* CPU node:
The Alpine platform includes cortex-a15 cores.
enable-method: must be "al,alpine-smp" to allow smp [1]
Example:
cpus {
#address-cells = <1>;
#size-cells = <0>;
enable-method = "al,alpine-smp";
cpu@0 {
compatible = "arm,cortex-a15";
device_type = "cpu";
reg = <0>;
};
cpu@1 {
compatible = "arm,cortex-a15";
device_type = "cpu";
reg = <1>;
};
cpu@2 {
compatible = "arm,cortex-a15";
device_type = "cpu";
reg = <2>;
};
cpu@3 {
compatible = "arm,cortex-a15";
device_type = "cpu";
reg = <3>;
};
};
* Alpine CPU resume registers
The CPU resume register are used to define required resume address after
reset.
Properties:
- compatible : Should contain "al,alpine-cpu-resume".
- reg : Offset and length of the register set for the device
Example:
cpu_resume {
compatible = "al,alpine-cpu-resume";
reg = <0xfbff5ed0 0x30>;
};
* Alpine System-Fabric Service Registers
The System-Fabric Service Registers allow various operation on CPU and
system fabric, like powering CPUs off.
Properties:
- compatible : Should contain "al,alpine-sysfabric-service" and "syscon".
- reg : Offset and length of the register set for the device
Example:
nb_service {
compatible = "al,alpine-sysfabric-service", "syscon";
reg = <0xfb070000 0x10000>;
};
[1] arm/cpu-enable-method/al,alpine-smp

View File

@ -57,12 +57,17 @@ Boards with the Amlogic Meson AXG A113D SoC shall have the following properties:
Required root node property:
compatible: "amlogic,a113d", "amlogic,meson-axg";
Boards with the Amlogic Meson G12A S905D2 SoC shall have the following properties:
Required root node property:
compatible: "amlogic,g12a";
Board compatible values (alphabetically, grouped by SoC):
- "geniatech,atv1200" (Meson6)
- "minix,neo-x8" (Meson8)
- "endless,ec100" (Meson8b)
- "hardkernel,odroid-c1" (Meson8b)
- "tronfy,mxq" (Meson8b)
@ -101,6 +106,8 @@ Board compatible values (alphabetically, grouped by SoC):
- "amlogic,s400" (Meson axg a113d)
- "amlogic,u200" (Meson g12a s905d2)
Amlogic Meson Firmware registers Interface
------------------------------------------

View File

@ -70,173 +70,3 @@ compatible: must be one of:
- "atmel,samv71q19"
- "atmel,samv71q20"
- "atmel,samv71q21"
Chipid required properties:
- compatible: Should be "atmel,sama5d2-chipid"
- reg : Should contain registers location and length
PIT Timer required properties:
- compatible: Should be "atmel,at91sam9260-pit"
- reg: Should contain registers location and length
- interrupts: Should contain interrupt for the PIT which is the IRQ line
shared across all System Controller members.
System Timer (ST) required properties:
- compatible: Should be "atmel,at91rm9200-st", "syscon", "simple-mfd"
- reg: Should contain registers location and length
- interrupts: Should contain interrupt for the ST which is the IRQ line
shared across all System Controller members.
- clocks: phandle to input clock.
Its subnodes can be:
- watchdog: compatible should be "atmel,at91rm9200-wdt"
RSTC Reset Controller required properties:
- compatible: Should be "atmel,<chip>-rstc".
<chip> can be "at91sam9260" or "at91sam9g45" or "sama5d3"
- reg: Should contain registers location and length
- clocks: phandle to input clock.
Example:
rstc@fffffd00 {
compatible = "atmel,at91sam9260-rstc";
reg = <0xfffffd00 0x10>;
clocks = <&clk32k>;
};
RAMC SDRAM/DDR Controller required properties:
- compatible: Should be "atmel,at91rm9200-sdramc", "syscon"
"atmel,at91sam9260-sdramc",
"atmel,at91sam9g45-ddramc",
"atmel,sama5d3-ddramc",
- reg: Should contain registers location and length
Examples:
ramc0: ramc@ffffe800 {
compatible = "atmel,at91sam9g45-ddramc";
reg = <0xffffe800 0x200>;
};
SHDWC Shutdown Controller
required properties:
- compatible: Should be "atmel,<chip>-shdwc".
<chip> can be "at91sam9260", "at91sam9rl" or "at91sam9x5".
- reg: Should contain registers location and length
- clocks: phandle to input clock.
optional properties:
- atmel,wakeup-mode: String, operation mode of the wakeup mode.
Supported values are: "none", "high", "low", "any".
- atmel,wakeup-counter: Counter on Wake-up 0 (between 0x0 and 0xf).
optional at91sam9260 properties:
- atmel,wakeup-rtt-timer: boolean to enable Real-time Timer Wake-up.
optional at91sam9rl properties:
- atmel,wakeup-rtc-timer: boolean to enable Real-time Clock Wake-up.
- atmel,wakeup-rtt-timer: boolean to enable Real-time Timer Wake-up.
optional at91sam9x5 properties:
- atmel,wakeup-rtc-timer: boolean to enable Real-time Clock Wake-up.
Example:
shdwc@fffffd10 {
compatible = "atmel,at91sam9260-shdwc";
reg = <0xfffffd10 0x10>;
clocks = <&clk32k>;
};
SHDWC SAMA5D2-Compatible Shutdown Controller
1) shdwc node
required properties:
- compatible: should be "atmel,sama5d2-shdwc".
- reg: should contain registers location and length
- clocks: phandle to input clock.
- #address-cells: should be one. The cell is the wake-up input index.
- #size-cells: should be zero.
optional properties:
- debounce-delay-us: minimum wake-up inputs debouncer period in
microseconds. It's usually a board-related property.
- atmel,wakeup-rtc-timer: boolean to enable Real-Time Clock wake-up.
The node contains child nodes for each wake-up input that the platform uses.
2) input nodes
Wake-up input nodes are usually described in the "board" part of the Device
Tree. Note also that input 0 is linked to the wake-up pin and is frequently
used.
Required properties:
- reg: should contain the wake-up input index [0 - 15].
Optional properties:
- atmel,wakeup-active-high: boolean, the corresponding wake-up input described
by the child, forces the wake-up of the core power supply on a high level.
The default is to be active low.
Example:
On the SoC side:
shdwc@f8048010 {
compatible = "atmel,sama5d2-shdwc";
reg = <0xf8048010 0x10>;
clocks = <&clk32k>;
#address-cells = <1>;
#size-cells = <0>;
atmel,wakeup-rtc-timer;
};
On the board side:
shdwc@f8048010 {
debounce-delay-us = <976>;
input@0 {
reg = <0>;
};
input@1 {
reg = <1>;
atmel,wakeup-active-high;
};
};
Special Function Registers (SFR)
Special Function Registers (SFR) manage specific aspects of the integrated
memory, bridge implementations, processor and other functionality not controlled
elsewhere.
required properties:
- compatible: Should be "atmel,<chip>-sfr", "syscon" or
"atmel,<chip>-sfrbu", "syscon"
<chip> can be "sama5d3", "sama5d4" or "sama5d2".
- reg: Should contain registers location and length
sfr@f0038000 {
compatible = "atmel,sama5d3-sfr", "syscon";
reg = <0xf0038000 0x60>;
};
Security Module (SECUMOD)
The Security Module macrocell provides all necessary secure functions to avoid
voltage, temperature, frequency and mechanical attacks on the chip. It also
embeds secure memories that can be scrambled
required properties:
- compatible: Should be "atmel,<chip>-secumod", "syscon".
<chip> can be "sama5d2".
- reg: Should contain registers location and length
secumod@fc040000 {
compatible = "atmel,sama5d2-secumod", "syscon";
reg = <0xfc040000 0x100>;
};

View File

@ -0,0 +1,171 @@
Atmel system registers
Chipid required properties:
- compatible: Should be "atmel,sama5d2-chipid"
- reg : Should contain registers location and length
PIT Timer required properties:
- compatible: Should be "atmel,at91sam9260-pit"
- reg: Should contain registers location and length
- interrupts: Should contain interrupt for the PIT which is the IRQ line
shared across all System Controller members.
System Timer (ST) required properties:
- compatible: Should be "atmel,at91rm9200-st", "syscon", "simple-mfd"
- reg: Should contain registers location and length
- interrupts: Should contain interrupt for the ST which is the IRQ line
shared across all System Controller members.
- clocks: phandle to input clock.
Its subnodes can be:
- watchdog: compatible should be "atmel,at91rm9200-wdt"
RSTC Reset Controller required properties:
- compatible: Should be "atmel,<chip>-rstc".
<chip> can be "at91sam9260" or "at91sam9g45" or "sama5d3"
- reg: Should contain registers location and length
- clocks: phandle to input clock.
Example:
rstc@fffffd00 {
compatible = "atmel,at91sam9260-rstc";
reg = <0xfffffd00 0x10>;
clocks = <&clk32k>;
};
RAMC SDRAM/DDR Controller required properties:
- compatible: Should be "atmel,at91rm9200-sdramc", "syscon"
"atmel,at91sam9260-sdramc",
"atmel,at91sam9g45-ddramc",
"atmel,sama5d3-ddramc",
- reg: Should contain registers location and length
Examples:
ramc0: ramc@ffffe800 {
compatible = "atmel,at91sam9g45-ddramc";
reg = <0xffffe800 0x200>;
};
SHDWC Shutdown Controller
required properties:
- compatible: Should be "atmel,<chip>-shdwc".
<chip> can be "at91sam9260", "at91sam9rl" or "at91sam9x5".
- reg: Should contain registers location and length
- clocks: phandle to input clock.
optional properties:
- atmel,wakeup-mode: String, operation mode of the wakeup mode.
Supported values are: "none", "high", "low", "any".
- atmel,wakeup-counter: Counter on Wake-up 0 (between 0x0 and 0xf).
optional at91sam9260 properties:
- atmel,wakeup-rtt-timer: boolean to enable Real-time Timer Wake-up.
optional at91sam9rl properties:
- atmel,wakeup-rtc-timer: boolean to enable Real-time Clock Wake-up.
- atmel,wakeup-rtt-timer: boolean to enable Real-time Timer Wake-up.
optional at91sam9x5 properties:
- atmel,wakeup-rtc-timer: boolean to enable Real-time Clock Wake-up.
Example:
shdwc@fffffd10 {
compatible = "atmel,at91sam9260-shdwc";
reg = <0xfffffd10 0x10>;
clocks = <&clk32k>;
};
SHDWC SAMA5D2-Compatible Shutdown Controller
1) shdwc node
required properties:
- compatible: should be "atmel,sama5d2-shdwc".
- reg: should contain registers location and length
- clocks: phandle to input clock.
- #address-cells: should be one. The cell is the wake-up input index.
- #size-cells: should be zero.
optional properties:
- debounce-delay-us: minimum wake-up inputs debouncer period in
microseconds. It's usually a board-related property.
- atmel,wakeup-rtc-timer: boolean to enable Real-Time Clock wake-up.
The node contains child nodes for each wake-up input that the platform uses.
2) input nodes
Wake-up input nodes are usually described in the "board" part of the Device
Tree. Note also that input 0 is linked to the wake-up pin and is frequently
used.
Required properties:
- reg: should contain the wake-up input index [0 - 15].
Optional properties:
- atmel,wakeup-active-high: boolean, the corresponding wake-up input described
by the child, forces the wake-up of the core power supply on a high level.
The default is to be active low.
Example:
On the SoC side:
shdwc@f8048010 {
compatible = "atmel,sama5d2-shdwc";
reg = <0xf8048010 0x10>;
clocks = <&clk32k>;
#address-cells = <1>;
#size-cells = <0>;
atmel,wakeup-rtc-timer;
};
On the board side:
shdwc@f8048010 {
debounce-delay-us = <976>;
input@0 {
reg = <0>;
};
input@1 {
reg = <1>;
atmel,wakeup-active-high;
};
};
Special Function Registers (SFR)
Special Function Registers (SFR) manage specific aspects of the integrated
memory, bridge implementations, processor and other functionality not controlled
elsewhere.
required properties:
- compatible: Should be "atmel,<chip>-sfr", "syscon" or
"atmel,<chip>-sfrbu", "syscon"
<chip> can be "sama5d3", "sama5d4" or "sama5d2".
- reg: Should contain registers location and length
sfr@f0038000 {
compatible = "atmel,sama5d3-sfr", "syscon";
reg = <0xf0038000 0x60>;
};
Security Module (SECUMOD)
The Security Module macrocell provides all necessary secure functions to avoid
voltage, temperature, frequency and mechanical attacks on the chip. It also
embeds secure memories that can be scrambled
required properties:
- compatible: Should be "atmel,<chip>-secumod", "syscon".
<chip> can be "sama5d2".
- reg: Should contain registers location and length
secumod@fc040000 {
compatible = "atmel,sama5d2-secumod", "syscon";
reg = <0xfc040000 0x100>;
};

View File

@ -42,6 +42,14 @@ Raspberry Pi Compute Module
Required root node properties:
compatible = "raspberrypi,compute-module", "brcm,bcm2835";
Raspberry Pi Compute Module 3
Required root node properties:
compatible = "raspberrypi,3-compute-module", "brcm,bcm2837";
Raspberry Pi Compute Module 3 Lite
Required root node properties:
compatible = "raspberrypi,3-compute-module-lite", "brcm,bcm2837";
Raspberry Pi Zero
Required root node properties:
compatible = "raspberrypi,model-zero", "brcm,bcm2835";

View File

@ -54,9 +54,7 @@ its hardware characteristcs.
clocks the core of that coresight component. The latter clock
is optional.
* port or ports: The representation of the component's port
layout using the generic DT graph presentation found in
"bindings/graph.txt".
* port or ports: see "Graph bindings for Coresight" below.
* Additional required properties for System Trace Macrocells (STM):
* reg: along with the physical base address and length of the register
@ -73,7 +71,7 @@ its hardware characteristcs.
AMBA markee):
- "arm,coresight-replicator"
* port or ports: same as above.
* port or ports: see "Graph bindings for Coresight" below.
* Optional properties for ETM/PTMs:
@ -96,6 +94,20 @@ its hardware characteristcs.
* interrupts : Exactly one SPI may be listed for reporting the address
error
Graph bindings for Coresight
-------------------------------
Coresight components are interconnected to create a data path for the flow of
trace data generated from the "sources" to their collection points "sink".
Each coresight component must describe the "input" and "output" connections.
The connections must be described via generic DT graph bindings as described
by the "bindings/graph.txt", where each "port" along with an "endpoint"
component represents a hardware port and the connection.
* All output ports must be listed inside a child node named "out-ports"
* All input ports must be listed inside a child node named "in-ports".
* Port address must match the hardware port number.
Example:
1. Sinks
@ -105,10 +117,11 @@ Example:
clocks = <&oscclk6a>;
clock-names = "apb_pclk";
port {
etb_in_port: endpoint@0 {
slave-mode;
remote-endpoint = <&replicator_out_port0>;
in-ports {
port {
etb_in_port: endpoint@0 {
remote-endpoint = <&replicator_out_port0>;
};
};
};
};
@ -119,10 +132,11 @@ Example:
clocks = <&oscclk6a>;
clock-names = "apb_pclk";
port {
tpiu_in_port: endpoint@0 {
slave-mode;
remote-endpoint = <&replicator_out_port1>;
in-ports {
port {
tpiu_in_port: endpoint@0 {
remote-endpoint = <&replicator_out_port1>;
};
};
};
};
@ -133,22 +147,16 @@ Example:
clocks = <&oscclk6a>;
clock-names = "apb_pclk";
ports {
#address-cells = <1>;
#size-cells = <0>;
/* input port */
port@0 {
reg = <0>;
in-ports {
port {
etr_in_port: endpoint {
slave-mode;
remote-endpoint = <&replicator2_out_port0>;
};
};
};
/* CATU link represented by output port */
port@1 {
reg = <1>;
out-ports {
port {
etr_out_port: endpoint {
remote-endpoint = <&catu_in_port>;
};
@ -163,7 +171,7 @@ Example:
*/
compatible = "arm,coresight-replicator";
ports {
out-ports {
#address-cells = <1>;
#size-cells = <0>;
@ -181,12 +189,11 @@ Example:
remote-endpoint = <&tpiu_in_port>;
};
};
};
/* replicator input port */
port@2 {
reg = <0>;
in-ports {
port {
replicator_in_port0: endpoint {
slave-mode;
remote-endpoint = <&funnel_out_port0>;
};
};
@ -199,40 +206,36 @@ Example:
clocks = <&oscclk6a>;
clock-names = "apb_pclk";
ports {
#address-cells = <1>;
#size-cells = <0>;
/* funnel output port */
port@0 {
reg = <0>;
out-ports {
port {
funnel_out_port0: endpoint {
remote-endpoint =
<&replicator_in_port0>;
};
};
};
/* funnel input ports */
port@1 {
in-ports {
#address-cells = <1>;
#size-cells = <0>;
port@0 {
reg = <0>;
funnel_in_port0: endpoint {
slave-mode;
remote-endpoint = <&ptm0_out_port>;
};
};
port@2 {
port@1 {
reg = <1>;
funnel_in_port1: endpoint {
slave-mode;
remote-endpoint = <&ptm1_out_port>;
};
};
port@3 {
port@2 {
reg = <2>;
funnel_in_port2: endpoint {
slave-mode;
remote-endpoint = <&etm0_out_port>;
};
};
@ -248,9 +251,11 @@ Example:
cpu = <&cpu0>;
clocks = <&oscclk6a>;
clock-names = "apb_pclk";
port {
ptm0_out_port: endpoint {
remote-endpoint = <&funnel_in_port0>;
out-ports {
port {
ptm0_out_port: endpoint {
remote-endpoint = <&funnel_in_port0>;
};
};
};
};
@ -262,9 +267,11 @@ Example:
cpu = <&cpu1>;
clocks = <&oscclk6a>;
clock-names = "apb_pclk";
port {
ptm1_out_port: endpoint {
remote-endpoint = <&funnel_in_port1>;
out-ports {
port {
ptm1_out_port: endpoint {
remote-endpoint = <&funnel_in_port1>;
};
};
};
};
@ -278,9 +285,11 @@ Example:
clocks = <&soc_smc50mhz>;
clock-names = "apb_pclk";
port {
stm_out_port: endpoint {
remote-endpoint = <&main_funnel_in_port2>;
out-ports {
port {
stm_out_port: endpoint {
remote-endpoint = <&main_funnel_in_port2>;
};
};
};
};
@ -295,10 +304,11 @@ Example:
clock-names = "apb_pclk";
interrupts = <GIC_SPI 4 IRQ_TYPE_LEVEL_HIGH>;
port {
catu_in_port: endpoint {
slave-mode;
remote-endpoint = <&etr_out_port>;
in-ports {
port {
catu_in_port: endpoint {
remote-endpoint = <&etr_out_port>;
};
};
};
};

View File

@ -59,9 +59,11 @@ mhz values (normalized w.r.t. the highest value found while parsing the DT).
===========================================
Example 1 (ARM 64-bit, 6-cpu system, two clusters):
capacities-dmips-mhz are scaled w.r.t. 1024 (cpu@0 and cpu@1)
supposing cluster0@max-freq=1100 and custer1@max-freq=850,
final capacities are 1024 for cluster0 and 446 for cluster1
The capacities-dmips-mhz or DMIPS/MHz values (scaled to 1024)
are 1024 and 578 for cluster0 and cluster1. Further normalization
is done by the operating system based on cluster0@max-freq=1100 and
custer1@max-freq=850, final capacities are 1024 for cluster0 and
446 for cluster1 (576*850/1100).
cpus {
#address-cells = <2>;

View File

@ -14,7 +14,28 @@ Related properties: (none)
Note:
This enable method requires valid nodes compatible with
"al,alpine-cpu-resume" and "al,alpine-nb-service"[1].
"al,alpine-cpu-resume" and "al,alpine-nb-service".
* Alpine CPU resume registers
The CPU resume register are used to define required resume address after
reset.
Properties:
- compatible : Should contain "al,alpine-cpu-resume".
- reg : Offset and length of the register set for the device
* Alpine System-Fabric Service Registers
The System-Fabric Service Registers allow various operation on CPU and
system fabric, like powering CPUs off.
Properties:
- compatible : Should contain "al,alpine-sysfabric-service" and "syscon".
- reg : Offset and length of the register set for the device
Example:
@ -48,5 +69,12 @@ cpus {
};
};
--
[1] arm/al,alpine.txt
cpu_resume {
compatible = "al,alpine-cpu-resume";
reg = <0xfbff5ed0 0x30>;
};
nb_service {
compatible = "al,alpine-sysfabric-service", "syscon";
reg = <0xfb070000 0x10000>;
};

View File

@ -276,7 +276,7 @@ described below.
Usage: optional
Value type: <prop-encoded-array>
Definition: A u32 value that represents the running time dynamic
power coefficient in units of mW/MHz/uV^2. The
power coefficient in units of uW/MHz/V^2. The
coefficient can either be calculated from power
measurements or derived by analysis.
@ -287,7 +287,7 @@ described below.
Pdyn = dynamic-power-coefficient * V^2 * f
where voltage is in uV, frequency is in MHz.
where voltage is in V, frequency is in MHz.
Example 1 (dual-cluster big.LITTLE system 32-bit):

View File

@ -0,0 +1,19 @@
Freescale DCFG
DCFG is the device configuration unit, that provides general purpose
configuration and status for the device. Such as setting the secondary
core start address and release the secondary core from holdoff and startup.
Required properties:
- compatible: Should contain a chip-specific compatible string,
Chip-specific strings are of the form "fsl,<chip>-dcfg",
The following <chip>s are known to be supported:
ls1012a, ls1021a, ls1043a, ls1046a, ls2080a.
- reg : should contain base address and length of DCFG memory-mapped registers
Example:
dcfg: dcfg@1ee0000 {
compatible = "fsl,ls1021a-dcfg";
reg = <0x0 0x1ee0000 0x0 0x10000>;
};

View File

@ -0,0 +1,19 @@
Freescale SCFG
SCFG is the supplemental configuration unit, that provides SoC specific
configuration and status registers for the chip. Such as getting PEX port
status.
Required properties:
- compatible: Should contain a chip-specific compatible string,
Chip-specific strings are of the form "fsl,<chip>-scfg",
The following <chip>s are known to be supported:
ls1012a, ls1021a, ls1043a, ls1046a, ls2080a.
- reg: should contain base address and length of SCFG memory-mapped registers
Example:
scfg: scfg@1570000 {
compatible = "fsl,ls1021a-scfg";
reg = <0x0 0x1570000 0x0 0x10000>;
};

View File

@ -0,0 +1,183 @@
NXP i.MX System Controller Firmware (SCFW)
--------------------------------------------------------------------
The System Controller Firmware (SCFW) is a low-level system function
which runs on a dedicated Cortex-M core to provide power, clock, and
resource management. It exists on some i.MX8 processors. e.g. i.MX8QM
(QM, QP), and i.MX8QX (QXP, DX).
The AP communicates with the SC using a multi-ported MU module found
in the LSIO subsystem. The current definition of this MU module provides
5 remote AP connections to the SC to support up to 5 execution environments
(TZ, HV, standard Linux, etc.). The SC side of this MU module interfaces
with the LSIO DSC IP bus. The SC firmware will communicate with this MU
using the MSI bus.
System Controller Device Node:
============================================================
The scu node with the following properties shall be under the /firmware/ node.
Required properties:
-------------------
- compatible: should be "fsl,imx-scu".
- mbox-names: should include "tx0", "tx1", "tx2", "tx3",
"rx0", "rx1", "rx2", "rx3".
- mboxes: List of phandle of 4 MU channels for tx and 4 MU channels
for rx. All 8 MU channels must be in the same MU instance.
Cross instances are not allowed. The MU instance can only
be one of LSIO MU0~M4 for imx8qxp and imx8qm. Users need
to make sure use the one which is not conflict with other
execution environments. e.g. ATF.
Note:
Channel 0 must be "tx0" or "rx0".
Channel 1 must be "tx1" or "rx1".
Channel 2 must be "tx2" or "rx2".
Channel 3 must be "tx3" or "rx3".
e.g.
mboxes = <&lsio_mu1 0 0
&lsio_mu1 0 1
&lsio_mu1 0 2
&lsio_mu1 0 3
&lsio_mu1 1 0
&lsio_mu1 1 1
&lsio_mu1 1 2
&lsio_mu1 1 3>;
See Documentation/devicetree/bindings/mailbox/fsl,mu.txt
for detailed mailbox binding.
i.MX SCU Client Device Node:
============================================================
Client nodes are maintained as children of the relevant IMX-SCU device node.
Power domain bindings based on SCU Message Protocol
------------------------------------------------------------
This binding for the SCU power domain providers uses the generic power
domain binding[2].
Required properties:
- compatible: Should be "fsl,scu-pd".
- #address-cells: Should be 1.
- #size-cells: Should be 0.
Required properties for power domain sub nodes:
- #power-domain-cells: Must be 0.
Optional Properties:
- reg: Resource ID of this power domain.
No exist means uncontrollable by user.
See detailed Resource ID list from:
include/dt-bindings/power/imx-rsrc.h
- power-domains: phandle pointing to the parent power domain.
Clock bindings based on SCU Message Protocol
------------------------------------------------------------
This binding uses the common clock binding[1].
Required properties:
- compatible: Should be "fsl,imx8qxp-clock".
- #clock-cells: Should be 1. Contains the Clock ID value.
- clocks: List of clock specifiers, must contain an entry for
each required entry in clock-names
- clock-names: Should include entries "xtal_32KHz", "xtal_24MHz"
The clock consumer should specify the desired clock by having the clock
ID in its "clocks" phandle cell.
See the full list of clock IDs from:
include/dt-bindings/clock/imx8qxp-clock.h
Pinctrl bindings based on SCU Message Protocol
------------------------------------------------------------
This binding uses the i.MX common pinctrl binding[3].
Required properties:
- compatible: Should be "fsl,imx8qxp-iomuxc".
Required properties for Pinctrl sub nodes:
- fsl,pins: Each entry consists of 3 integers which represents
the mux and config setting for one pin. The first 2
integers <pin_id mux_mode> are specified using a
PIN_FUNC_ID macro, which can be found in
<dt-bindings/pinctrl/pads-imx8qxp.h>.
The last integer CONFIG is the pad setting value like
pull-up on this pin.
Please refer to i.MX8QXP Reference Manual for detailed
CONFIG settings.
[1] Documentation/devicetree/bindings/clock/clock-bindings.txt
[2] Documentation/devicetree/bindings/power/power_domain.txt
[3] Documentation/devicetree/bindings/pinctrl/fsl,imx-pinctrl.txt
Example (imx8qxp):
-------------
lsio_mu1: mailbox@5d1c0000 {
...
#mbox-cells = <2>;
};
firmware {
scu {
compatible = "fsl,imx-scu";
mbox-names = "tx0", "tx1", "tx2", "tx3",
"rx0", "rx1", "rx2", "rx3";
mboxes = <&lsio_mu1 0 0
&lsio_mu1 0 1
&lsio_mu1 0 2
&lsio_mu1 0 3
&lsio_mu1 1 0
&lsio_mu1 1 1
&lsio_mu1 1 2
&lsio_mu1 1 3>;
clk: clk {
compatible = "fsl,imx8qxp-clk";
#clock-cells = <1>;
};
iomuxc {
compatible = "fsl,imx8qxp-iomuxc";
pinctrl_lpuart0: lpuart0grp {
fsl,pins = <
SC_P_UART0_RX_ADMA_UART0_RX 0x06000020
SC_P_UART0_TX_ADMA_UART0_TX 0x06000020
>;
};
...
};
imx8qx-pm {
compatible = "fsl,scu-pd";
#address-cells = <1>;
#size-cells = <0>;
pd_dma: dma-power-domain {
#power-domain-cells = <0>;
pd_dma_lpuart0: dma-lpuart0@57 {
reg = <SC_R_UART_0>;
#power-domain-cells = <0>;
power-domains = <&pd_dma>;
};
...
};
...
};
};
};
serial@5a060000 {
...
pinctrl-names = "default";
pinctrl-0 = <&pinctrl_lpuart0>;
clocks = <&clk IMX8QXP_UART0_CLK>,
<&clk IMX8QXP_UART0_IPG_CLK>;
clock-names = "per", "ipg";
power-domains = <&pd_dma_lpuart0>;
};

View File

@ -57,6 +57,50 @@ i.MX6SLL EVK board
Required root node properties:
- compatible = "fsl,imx6sll-evk", "fsl,imx6sll";
i.MX6 Quad Plus SABRE Smart Device Board
Required root node properties:
- compatible = "fsl,imx6qp-sabresd", "fsl,imx6qp";
i.MX6 Quad Plus SABRE Automotive Board
Required root node properties:
- compatible = "fsl,imx6qp-sabreauto", "fsl,imx6qp";
i.MX6 DualLite SABRE Smart Device Board
Required root node properties:
- compatible = "fsl,imx6dl-sabresd", "fsl,imx6dl";
i.MX6 DualLite/Solo SABRE Automotive Board
Required root node properties:
- compatible = "fsl,imx6dl-sabreauto", "fsl,imx6dl";
i.MX6 SoloLite EVK Board
Required root node properties:
- compatible = "fsl,imx6sl-evk", "fsl,imx6sl";
i.MX6 UltraLite 14x14 EVK Board
Required root node properties:
- compatible = "fsl,imx6ul-14x14-evk", "fsl,imx6ul";
i.MX6 UltraLiteLite 14x14 EVK Board
Required root node properties:
- compatible = "fsl,imx6ull-14x14-evk", "fsl,imx6ull";
i.MX6 ULZ 14x14 EVK Board
Required root node properties:
- compatible = "fsl,imx6ulz-14x14-evk", "fsl,imx6ull", "fsl,imx6ulz";
i.MX6 SoloX SDB Board
Required root node properties:
- compatible = "fsl,imx6sx-sdb", "fsl,imx6sx";
i.MX6 SoloX Sabre Auto Board
Required root node properties:
- compatible = "fsl,imx6sx-sabreauto", "fsl,imx6sx";
i.MX7 SabreSD Board
Required root node properties:
- compatible = "fsl,imx7d-sdb", "fsl,imx7d";
Generic i.MX boards
-------------------
@ -101,45 +145,6 @@ Freescale LS1021A Platform Device Tree Bindings
Required root node compatible properties:
- compatible = "fsl,ls1021a";
Freescale SoC-specific Device Tree Bindings
-------------------------------------------
Freescale SCFG
SCFG is the supplemental configuration unit, that provides SoC specific
configuration and status registers for the chip. Such as getting PEX port
status.
Required properties:
- compatible: Should contain a chip-specific compatible string,
Chip-specific strings are of the form "fsl,<chip>-scfg",
The following <chip>s are known to be supported:
ls1012a, ls1021a, ls1043a, ls1046a, ls2080a.
- reg: should contain base address and length of SCFG memory-mapped registers
Example:
scfg: scfg@1570000 {
compatible = "fsl,ls1021a-scfg";
reg = <0x0 0x1570000 0x0 0x10000>;
};
Freescale DCFG
DCFG is the device configuration unit, that provides general purpose
configuration and status for the device. Such as setting the secondary
core start address and release the secondary core from holdoff and startup.
Required properties:
- compatible: Should contain a chip-specific compatible string,
Chip-specific strings are of the form "fsl,<chip>-dcfg",
The following <chip>s are known to be supported:
ls1012a, ls1021a, ls1043a, ls1046a, ls2080a.
- reg : should contain base address and length of DCFG memory-mapped registers
Example:
dcfg: dcfg@1ee0000 {
compatible = "fsl,ls1021a-dcfg";
reg = <0x0 0x1ee0000 0x0 0x10000>;
};
Freescale ARMv8 based Layerscape SoC family Device Tree Bindings
----------------------------------------------------------------

View File

@ -8,6 +8,14 @@ HiKey960 Board
Required root node properties:
- compatible = "hisilicon,hi3660-hikey960", "hisilicon,hi3660";
Hi3670 SoC
Required root node properties:
- compatible = "hisilicon,hi3670";
HiKey970 Board
Required root node properties:
- compatible = "hisilicon,hi3670-hikey970", "hisilicon,hi3670";
Hi3798cv200 SoC
Required root node properties:
- compatible = "hisilicon,hi3798cv200";

View File

@ -45,11 +45,15 @@ Optional Properties:
debug_messages - Map the Debug message region
- reg: register space corresponding to the debug_messages
- ti,system-reboot-controller: If system reboot can be triggered by SoC reboot
- ti,host-id: Integer value corresponding to the host ID assigned by Firmware
for identification of host processing entities such as virtual
machines
Example (K2G):
-------------
pmmc: pmmc {
compatible = "ti,k2g-sci";
ti,host-id = <2>;
mbox-names = "rx", "tx";
mboxes= <&msgmgr &msgmgr_proxy_pmmc_rx>,
<&msgmgr &msgmgr_proxy_pmmc_tx>;

View File

@ -10,6 +10,7 @@ Required Properties:
- "mediatek,mt2712-apmixedsys", "syscon"
- "mediatek,mt6797-apmixedsys"
- "mediatek,mt7622-apmixedsys"
- "mediatek,mt7623-apmixedsys", "mediatek,mt2701-apmixedsys"
- "mediatek,mt8135-apmixedsys"
- "mediatek,mt8173-apmixedsys"
- #clock-cells: Must be 1

View File

@ -8,6 +8,7 @@ Required Properties:
- compatible: Should be one of:
- "mediatek,mt2701-audsys", "syscon"
- "mediatek,mt7622-audsys", "syscon"
- "mediatek,mt7623-audsys", "mediatek,mt2701-audsys", "syscon"
- #clock-cells: Must be 1
The AUDSYS controller uses the common clk binding from

View File

@ -8,6 +8,7 @@ Required Properties:
- compatible: Should be:
- "mediatek,mt2701-bdpsys", "syscon"
- "mediatek,mt2712-bdpsys", "syscon"
- "mediatek,mt7623-bdpsys", "mediatek,mt2701-bdpsys", "syscon"
- #clock-cells: Must be 1
The bdpsys controller uses the common clk binding from

View File

@ -8,6 +8,7 @@ Required Properties:
- compatible: Should be:
- "mediatek,mt2701-ethsys", "syscon"
- "mediatek,mt7622-ethsys", "syscon"
- "mediatek,mt7623-ethsys", "mediatek,mt2701-ethsys", "syscon"
- #clock-cells: Must be 1
- #reset-cells: Must be 1

View File

@ -9,6 +9,7 @@ Required Properties:
- compatible: Should be:
- "mediatek,mt2701-hifsys", "syscon"
- "mediatek,mt7622-hifsys", "syscon"
- "mediatek,mt7623-hifsys", "mediatek,mt2701-hifsys", "syscon"
- #clock-cells: Must be 1
The hifsys controller uses the common clk binding from

View File

@ -9,6 +9,7 @@ Required Properties:
- "mediatek,mt2701-imgsys", "syscon"
- "mediatek,mt2712-imgsys", "syscon"
- "mediatek,mt6797-imgsys", "syscon"
- "mediatek,mt7623-imgsys", "mediatek,mt2701-imgsys", "syscon"
- "mediatek,mt8173-imgsys", "syscon"
- #clock-cells: Must be 1

View File

@ -11,6 +11,7 @@ Required Properties:
- "mediatek,mt2712-infracfg", "syscon"
- "mediatek,mt6797-infracfg", "syscon"
- "mediatek,mt7622-infracfg", "syscon"
- "mediatek,mt7623-infracfg", "mediatek,mt2701-infracfg", "syscon"
- "mediatek,mt8135-infracfg", "syscon"
- "mediatek,mt8173-infracfg", "syscon"
- #clock-cells: Must be 1

View File

@ -9,6 +9,7 @@ Required Properties:
- "mediatek,mt2701-mmsys", "syscon"
- "mediatek,mt2712-mmsys", "syscon"
- "mediatek,mt6797-mmsys", "syscon"
- "mediatek,mt7623-mmsys", "mediatek,mt2701-mmsys", "syscon"
- "mediatek,mt8173-mmsys", "syscon"
- #clock-cells: Must be 1

View File

@ -10,6 +10,7 @@ Required Properties:
- "mediatek,mt2701-pericfg", "syscon"
- "mediatek,mt2712-pericfg", "syscon"
- "mediatek,mt7622-pericfg", "syscon"
- "mediatek,mt7623-pericfg", "mediatek,mt2701-pericfg", "syscon"
- "mediatek,mt8135-pericfg", "syscon"
- "mediatek,mt8173-pericfg", "syscon"
- #clock-cells: Must be 1

View File

@ -10,6 +10,7 @@ Required Properties:
- "mediatek,mt2712-topckgen", "syscon"
- "mediatek,mt6797-topckgen"
- "mediatek,mt7622-topckgen"
- "mediatek,mt7623-topckgen", "mediatek,mt2701-topckgen"
- "mediatek,mt8135-topckgen"
- "mediatek,mt8173-topckgen"
- #clock-cells: Must be 1

View File

@ -9,6 +9,7 @@ Required Properties:
- "mediatek,mt2701-vdecsys", "syscon"
- "mediatek,mt2712-vdecsys", "syscon"
- "mediatek,mt6797-vdecsys", "syscon"
- "mediatek,mt7623-vdecsys", "mediatek,mt2701-vdecsys", "syscon"
- "mediatek,mt8173-vdecsys", "syscon"
- #clock-cells: Must be 1

View File

@ -21,10 +21,29 @@ PROPERTIES
the register region. An optional second element specifies
the base address and size of the alias register region.
- clocks:
Usage: required
Value type: <prop-encoded-array>
Definition: reference to the pll parents.
- clock-names:
Usage: required
Value type: <stringlist>
Definition: must be "pll8_vote", "pxo".
- clock-output-names:
Usage: optional
Value type: <string>
Definition: Name of the output clock. Typically acpuX_aux where X is a
CPU number starting at 0.
Example:
clock-controller@2088000 {
compatible = "qcom,kpss-acc-v2";
reg = <0x02088000 0x1000>,
<0x02008000 0x1000>;
clocks = <&gcc PLL8_VOTE>, <&gcc PXO_SRC>;
clock-names = "pll8_vote", "pxo";
clock-output-names = "acpu0_aux";
};

View File

@ -0,0 +1,44 @@
Krait Processor Sub-system (KPSS) Global Clock Controller (GCC)
PROPERTIES
- compatible:
Usage: required
Value type: <string>
Definition: should be one of the following. The generic compatible
"qcom,kpss-gcc" should also be included.
"qcom,kpss-gcc-ipq8064", "qcom,kpss-gcc"
"qcom,kpss-gcc-apq8064", "qcom,kpss-gcc"
"qcom,kpss-gcc-msm8974", "qcom,kpss-gcc"
"qcom,kpss-gcc-msm8960", "qcom,kpss-gcc"
- reg:
Usage: required
Value type: <prop-encoded-array>
Definition: base address and size of the register region
- clocks:
Usage: required
Value type: <prop-encoded-array>
Definition: reference to the pll parents.
- clock-names:
Usage: required
Value type: <stringlist>
Definition: must be "pll8_vote", "pxo".
- clock-output-names:
Usage: required
Value type: <string>
Definition: Name of the output clock. Typically acpu_l2_aux indicating
an L2 cache auxiliary clock.
Example:
l2cc: clock-controller@2011000 {
compatible = "qcom,kpss-gcc-ipq8064", "qcom,kpss-gcc";
reg = <0x2011000 0x1000>;
clocks = <&gcc PLL8_VOTE>, <&gcc PXO_SRC>;
clock-names = "pll8_vote", "pxo";
clock-output-names = "acpu_l2_aux";
};

View File

@ -16,11 +16,26 @@ Properties:
- reg:
Usage: required
Value Type: <prop-encoded-array>
Definition: Start address and the the size of the register region.
Definition: The first element specifies the llcc base start address and
the size of the register region. The second element specifies
the llcc broadcast base address and size of the register region.
- reg-names:
Usage: required
Value Type: <stringlist>
Definition: Register region names. Must be "llcc_base", "llcc_broadcast_base".
- interrupts:
Usage: required
Definition: The interrupt is associated with the llcc edac device.
It's used for llcc cache single and double bit error detection
and reporting.
Example:
cache-controller@1100000 {
compatible = "qcom,sdm845-llcc";
reg = <0x1100000 0x250000>;
reg = <0x1100000 0x200000>, <0x1300000 0x50000> ;
reg-names = "llcc_base", "llcc_broadcast_base";
interrupts = <GIC_SPI 582 IRQ_TYPE_LEVEL_HIGH>;
};

View File

@ -5,6 +5,10 @@ Rockchip platforms device tree bindings
Required root node properties:
- compatible = "vamrs,ficus", "rockchip,rk3399";
- 96boards RK3399 Rock960 (ROCK960 Consumer Edition)
Required root node properties:
- compatible = "vamrs,rock960", "rockchip,rk3399";
- Amarula Vyasa RK3288 board
Required root node properties:
- compatible = "amarula,vyasa-rk3288", "rockchip,rk3288";
@ -13,6 +17,10 @@ Rockchip platforms device tree bindings
Required root node properties:
- compatible = "asus,rk3288-tinker", "rockchip,rk3288";
- Asus Tinker board S
Required root node properties:
- compatible = "asus,rk3288-tinker-s", "rockchip,rk3288";
- Kylin RK3036 board:
Required root node properties:
- compatible = "rockchip,kylin-rk3036", "rockchip,rk3036";
@ -59,6 +67,10 @@ Rockchip platforms device tree bindings
Required root node properties:
- compatible = "firefly,roc-rk3328-cc", "rockchip,rk3328";
- Firefly ROC-RK3399-PC board:
Required root node properties:
- compatible = "firefly,roc-rk3399-pc", "rockchip,rk3399";
- ChipSPARK PopMetal-RK3288 board:
Required root node properties:
- compatible = "chipspark,popmetal-rk3288", "rockchip,rk3288";
@ -160,6 +172,10 @@ Rockchip platforms device tree bindings
Required root node properties:
- compatible = "pine64,rock64", "rockchip,rk3328";
- Pine64 RockPro64 board:
Required root node properties:
- compatible = "pine64,rockpro64", "rockchip,rk3399";
- Rockchip PX3 Evaluation board:
Required root node properties:
- compatible = "rockchip,px3-evb", "rockchip,px3", "rockchip,rk3188";
@ -168,6 +184,10 @@ Rockchip platforms device tree bindings
Required root node properties:
- compatible = "rockchip,px5-evb", "rockchip,px5", "rockchip,rk3368";
- Rockchip PX30 Evaluation board:
Required root node properties:
- compatible = "rockchip,px30-evb", "rockchip,px30";
- Rockchip RV1108 Evaluation board
Required root node properties:
- compatible = "rockchip,rv1108-evb", "rockchip,rv1108";

View File

@ -22,7 +22,7 @@ References:
Example:
scu@a04100000 {
scu@a0410000 {
compatible = "arm,cortex-a9-scu";
reg = <0xa0410000 0x100>;
};

View File

@ -32,7 +32,8 @@ describe the view of Secure world using the standard bindings. These
secure- bindings only need to be used where both the Secure and Normal
world views need to be described in a single device tree.
Valid Secure world properties:
Valid Secure world properties
-----------------------------
- secure-status : specifies whether the device is present and usable
in the secure world. The combination of this with "status" allows
@ -51,3 +52,19 @@ Valid Secure world properties:
status = "disabled"; secure-status = "okay"; /* S-only */
status = "disabled"; /* disabled in both */
status = "disabled"; secure-status = "disabled"; /* disabled in both */
The secure-chosen node
----------------------
Similar to the /chosen node which serves as a place for passing data
between firmware and the operating system, the /secure-chosen node may
be used to pass data to the Secure OS. Only the properties defined
below may appear in the /secure-chosen node.
- stdout-path : specifies the device to be used by the Secure OS for
its console output. The syntax is the same as for /chosen/stdout-path.
If the /secure-chosen node exists but the stdout-path property is not
present, the Secure OS should not perform any console output. If
/secure-chosen does not exist, the Secure OS should use the value of
/chosen/stdout-path instead (that is, use the same device as the
Normal world OS).

Some files were not shown because too many files have changed in this diff Show More