From a5f6f88c3d1a453dd35cbaac2870f5fae866ad2e Mon Sep 17 00:00:00 2001 From: Jonathan Corbet Date: Fri, 24 May 2019 14:22:36 -0600 Subject: [PATCH 001/129] docs: Do not seek comments in kernel/rcu/tree_plugin.h There are no kerneldoc comments in this file, so do not attempt to include them in the docs build. Signed-off-by: Jonathan Corbet --- Documentation/core-api/kernel-api.rst | 2 -- Documentation/driver-api/basics.rst | 3 --- 2 files changed, 5 deletions(-) diff --git a/Documentation/core-api/kernel-api.rst b/Documentation/core-api/kernel-api.rst index a29c99d13331..a53ec2eb8176 100644 --- a/Documentation/core-api/kernel-api.rst +++ b/Documentation/core-api/kernel-api.rst @@ -358,8 +358,6 @@ Read-Copy Update (RCU) .. kernel-doc:: kernel/rcu/tree.c -.. kernel-doc:: kernel/rcu/tree_plugin.h - .. kernel-doc:: kernel/rcu/tree_exp.h .. kernel-doc:: kernel/rcu/update.c diff --git a/Documentation/driver-api/basics.rst b/Documentation/driver-api/basics.rst index e970fadf4d1a..1ba88c7b3984 100644 --- a/Documentation/driver-api/basics.rst +++ b/Documentation/driver-api/basics.rst @@ -115,9 +115,6 @@ Kernel utility functions .. kernel-doc:: kernel/rcu/tree.c :export: -.. kernel-doc:: kernel/rcu/tree_plugin.h - :export: - .. kernel-doc:: kernel/rcu/update.c :export: From e8d4f892bb245702ee23abfcd28eb98b5eca6c86 Mon Sep 17 00:00:00 2001 From: Jonathan Corbet Date: Fri, 24 May 2019 14:31:50 -0600 Subject: [PATCH 002/129] docs: Fix a misdirected kerneldoc directive The stratix10 service layer documentation tried to include a kerneldoc comments for a nonexistent struct; leading to a "no structured comments found" message. Switch it to stratix10_svc_command_config_type, which appears at that spot in the sequence and was not included. Signed-off-by: Jonathan Corbet --- Documentation/driver-api/firmware/other_interfaces.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/driver-api/firmware/other_interfaces.rst b/Documentation/driver-api/firmware/other_interfaces.rst index a4ac54b5fd79..b81794e0cfbb 100644 --- a/Documentation/driver-api/firmware/other_interfaces.rst +++ b/Documentation/driver-api/firmware/other_interfaces.rst @@ -33,7 +33,7 @@ of the requests on to a secure monitor (EL3). :functions: stratix10_svc_client_msg .. kernel-doc:: include/linux/firmware/intel/stratix10-svc-client.h - :functions: stratix10_svc_command_reconfig_payload + :functions: stratix10_svc_command_config_type .. kernel-doc:: include/linux/firmware/intel/stratix10-svc-client.h :functions: stratix10_svc_cb_data From 41ce14e39bbe0683a2d49385ee8a8cb0b1d010eb Mon Sep 17 00:00:00 2001 From: Jonathan Corbet Date: Fri, 24 May 2019 14:43:42 -0600 Subject: [PATCH 003/129] docs: Do not seek kerneldoc comments in hw-consumer.h There are no kerneldoc comments here, so looking for them just yields a warning in the docs build. Signed-off-by: Jonathan Corbet --- Documentation/driver-api/iio/hw-consumer.rst | 1 - 1 file changed, 1 deletion(-) diff --git a/Documentation/driver-api/iio/hw-consumer.rst b/Documentation/driver-api/iio/hw-consumer.rst index e0fe0b98230e..819fb9edc005 100644 --- a/Documentation/driver-api/iio/hw-consumer.rst +++ b/Documentation/driver-api/iio/hw-consumer.rst @@ -45,7 +45,6 @@ A typical IIO HW consumer setup looks like this:: More details ============ -.. kernel-doc:: include/linux/iio/hw-consumer.h .. kernel-doc:: drivers/iio/buffer/industrialio-hw-consumer.c :export: From 3aef4472665695be7cbdd2cc274814f56d36e4ef Mon Sep 17 00:00:00 2001 From: Jonathan Corbet Date: Fri, 24 May 2019 15:01:30 -0600 Subject: [PATCH 004/129] docs: No structured comments in target_core_device.c Documentation/driver-api/target.rst is seeking kerneldoc comments in drivers/target/target_core_device.c, but no such comments exist. Take out the kernel-doc directive and eliminate one warning from the build. Signed-off-by: Jonathan Corbet --- Documentation/driver-api/target.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Documentation/driver-api/target.rst b/Documentation/driver-api/target.rst index 4363611dd86d..620ec6173a93 100644 --- a/Documentation/driver-api/target.rst +++ b/Documentation/driver-api/target.rst @@ -10,8 +10,8 @@ TBD Target core device interfaces ============================= -.. kernel-doc:: drivers/target/target_core_device.c - :export: +This section is blank because no kerneldoc comments have been added to +drivers/target/target_core_device.c. Target core transport interfaces ================================ From dea20be5063c97bdac48e81ee2a85975f14885ed Mon Sep 17 00:00:00 2001 From: Jonathan Corbet Date: Fri, 24 May 2019 15:03:39 -0600 Subject: [PATCH 005/129] docs: no structured comments in fs/file_table.c Remove the kernel-doc directive, since there are only warnings to be found there. Signed-off-by: Jonathan Corbet --- Documentation/filesystems/api-summary.rst | 3 --- 1 file changed, 3 deletions(-) diff --git a/Documentation/filesystems/api-summary.rst b/Documentation/filesystems/api-summary.rst index aa51ffcfa029..bbb0c1c0e5cf 100644 --- a/Documentation/filesystems/api-summary.rst +++ b/Documentation/filesystems/api-summary.rst @@ -89,9 +89,6 @@ Other Functions .. kernel-doc:: fs/direct-io.c :export: -.. kernel-doc:: fs/file_table.c - :export: - .. kernel-doc:: fs/libfs.c :export: From 3f715b147a6c5245ee25d7334f4053c339feef98 Mon Sep 17 00:00:00 2001 From: Jonathan Corbet Date: Fri, 24 May 2019 15:05:41 -0600 Subject: [PATCH 006/129] docs: No structured comments in include/linux/interconnect.h Remove the kernel-doc directive for this file, since there's nothing there and it generates a warning. Signed-off-by: Jonathan Corbet --- Documentation/interconnect/interconnect.rst | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/Documentation/interconnect/interconnect.rst b/Documentation/interconnect/interconnect.rst index b8107dcc4cd3..c3e004893796 100644 --- a/Documentation/interconnect/interconnect.rst +++ b/Documentation/interconnect/interconnect.rst @@ -89,6 +89,5 @@ Interconnect consumers Interconnect consumers are the clients which use the interconnect APIs to get paths between endpoints and set their bandwidth/latency/QoS requirements -for these interconnect paths. - -.. kernel-doc:: include/linux/interconnect.h +for these interconnect paths. These interfaces are not currently +documented. From b0d60bfbb60cef1efd699a65e29a94487f8c7b1f Mon Sep 17 00:00:00 2001 From: Jonathan Corbet Date: Fri, 24 May 2019 14:52:01 -0600 Subject: [PATCH 007/129] kernel-doc: always name missing kerneldoc sections The "no structured comments found" warning is not particularly useful if there are several invocations, one of which is looking for something wrong. So if something specific has been requested, make it clear that it's the one we weren't able to find. Signed-off-by: Jonathan Corbet --- scripts/kernel-doc | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/scripts/kernel-doc b/scripts/kernel-doc index 3350e498b4ce..c0cb41e65b9b 100755 --- a/scripts/kernel-doc +++ b/scripts/kernel-doc @@ -285,7 +285,7 @@ use constant { OUTPUT_INTERNAL => 4, # output non-exported symbols }; my $output_selection = OUTPUT_ALL; -my $show_not_found = 0; +my $show_not_found = 0; # No longer used my @export_file_list; @@ -435,7 +435,7 @@ while ($ARGV[0] =~ m/^--?(.*)/) { } elsif ($cmd eq 'enable-lineno') { $enable_lineno = 1; } elsif ($cmd eq 'show-not-found') { - $show_not_found = 1; + $show_not_found = 1; # A no-op but don't fail } else { # Unknown argument usage(); @@ -2163,12 +2163,14 @@ sub process_file($) { } # Make sure we got something interesting. - if ($initial_section_counter == $section_counter) { - if ($output_mode ne "none") { - print STDERR "${file}:1: warning: no structured comments found\n"; + if ($initial_section_counter == $section_counter && $ + output_mode ne "none") { + if ($output_selection == OUTPUT_INCLUDE) { + print STDERR "${file}:1: warning: '$_' not found\n" + for keys %function_table; } - if (($output_selection == OUTPUT_INCLUDE) && ($show_not_found == 1)) { - print STDERR " Was looking for '$_'.\n" for keys %function_table; + else { + print STDERR "${file}:1: warning: no structured comments found\n"; } } } From 42f6ebd827832e62a37350ffad776ea785a2486b Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Thu, 23 May 2019 07:43:43 -0300 Subject: [PATCH 008/129] docs: cdomain.py: get rid of a warning since version 1.8 There's a new warning about a deprecation function. Add a logic at cdomain.py to avoid that. Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Jonathan Corbet --- Documentation/sphinx/cdomain.py | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/Documentation/sphinx/cdomain.py b/Documentation/sphinx/cdomain.py index cf13ff3a656c..cbac8e608dc4 100644 --- a/Documentation/sphinx/cdomain.py +++ b/Documentation/sphinx/cdomain.py @@ -48,7 +48,10 @@ major, minor, patch = sphinx.version_info[:3] def setup(app): - app.override_domain(CDomain) + if (major == 1 and minor < 8): + app.override_domain(CDomain) + else: + app.add_domain(CDomain, override=True) return dict( version = __version__, From fe4ec72cca500b2f97ffa0429b4cd57f67e0821d Mon Sep 17 00:00:00 2001 From: Masanari Iida Date: Tue, 21 May 2019 21:30:00 +0900 Subject: [PATCH 009/129] docs: tracing: Fix typos in histogram.rst This patch fixes some spelling typos in histogram.rst Signed-off-by: Masanari Iida Acked-by: Steven Rostedt (VMware) Signed-off-by: Jonathan Corbet --- Documentation/trace/histogram.rst | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/Documentation/trace/histogram.rst b/Documentation/trace/histogram.rst index fb621a1c2638..8408670d0328 100644 --- a/Documentation/trace/histogram.rst +++ b/Documentation/trace/histogram.rst @@ -1010,7 +1010,7 @@ Extended error information For example, suppose we wanted to take a look at the relative weights in terms of skb length for each callpath that leads to a - netif_receieve_skb event when downloading a decent-sized file using + netif_receive_skb event when downloading a decent-sized file using wget. First we set up an initially paused stacktrace trigger on the @@ -1843,7 +1843,7 @@ practice, not every handler.action combination is currently supported; if a given handler.action combination isn't supported, the hist trigger will fail with -EINVAL; -The default 'handler.action' if none is explicity specified is as it +The default 'handler.action' if none is explicitly specified is as it always has been, to simply update the set of values associated with an entry. Some applications, however, may want to perform additional actions at that point, such as generate another event, or compare and @@ -2088,7 +2088,7 @@ The following commonly-used handler.action pairs are available: and the saved values corresponding to the max are displayed following the rest of the fields. - If a snaphot was taken, there is also a message indicating that, + If a snapshot was taken, there is also a message indicating that, along with the value and event that triggered the global maximum: # cat /sys/kernel/debug/tracing/events/sched/sched_switch/hist @@ -2176,7 +2176,7 @@ The following commonly-used handler.action pairs are available: hist trigger entry. Note that in this case the changed value is a global variable - associated withe current trace instance. The key of the specific + associated with current trace instance. The key of the specific trace event that caused the value to change and the global value itself are displayed, along with a message stating that a snapshot has been taken and where to find it. The user can use the key @@ -2203,7 +2203,7 @@ The following commonly-used handler.action pairs are available: and the saved values corresponding to that value are displayed following the rest of the fields. - If a snaphot was taken, there is also a message indicating that, + If a snapshot was taken, there is also a message indicating that, along with the value and event that triggered the snapshot:: # cat /sys/kernel/debug/tracing/events/tcp/tcp_probe/hist From 93285c01977729a2e046e065e4b99791b966130c Mon Sep 17 00:00:00 2001 From: Zhenzhong Duan Date: Tue, 21 May 2019 10:32:08 +0800 Subject: [PATCH 010/129] doc: kernel-parameters.txt: fix documentation of nmi_watchdog parameter The default behavior of hardlockup depends on the config of CONFIG_BOOTPARAM_HARDLOCKUP_PANIC. Fix the description of nmi_watchdog to make it clear. Suggested-by: Steven Rostedt (VMware) Signed-off-by: Zhenzhong Duan Reviewed-by: Joel Fernandes (Google) Acked-by: Ingo Molnar Acked-by: Steven Rostedt (VMware) Cc: Thomas Gleixner Cc: Kees Cook Cc: Greg Kroah-Hartman Cc: linux-doc@vger.kernel.org Signed-off-by: Jonathan Corbet --- Documentation/admin-guide/kernel-parameters.txt | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 138f6664b2e2..79d043b8850d 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -2836,8 +2836,9 @@ 0 - turn hardlockup detector in nmi_watchdog off 1 - turn hardlockup detector in nmi_watchdog on When panic is specified, panic when an NMI watchdog - timeout occurs (or 'nopanic' to override the opposite - default). To disable both hard and soft lockup detectors, + timeout occurs (or 'nopanic' to not panic on an NMI + watchdog, if CONFIG_BOOTPARAM_HARDLOCKUP_PANIC is set) + To disable both hard and soft lockup detectors, please see 'nowatchdog'. This is useful when you use a panic=... timeout and need the box quickly up again. From 50c1f43a37d006ac24755397614b00064a8f293a Mon Sep 17 00:00:00 2001 From: "Tobin C. Harding" Date: Wed, 15 May 2019 10:29:05 +1000 Subject: [PATCH 011/129] docs: filesystems: vfs: Remove space before tab Currently the file has a bunch of spaces before tabspaces. This is a nuisance when patching the file because they show up whenever we touch these lines. Let's just fix them all now in preparation for doing the RST conversion. Remove spaces before tabspaces. Tested-by: Randy Dunlap Signed-off-by: Tobin C. Harding Signed-off-by: Jonathan Corbet --- Documentation/filesystems/vfs.txt | 78 +++++++++++++++---------------- 1 file changed, 39 insertions(+), 39 deletions(-) diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index 57fc576b1f3e..cab5a36f39c6 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt @@ -134,7 +134,7 @@ struct file_system_type { should be shut down owner: for internal VFS use: you should initialize this to THIS_MODULE in - most cases. + most cases. next: for internal VFS use: you should initialize this to NULL @@ -143,7 +143,7 @@ struct file_system_type { The mount() method has the following arguments: struct file_system_type *fs_type: describes the filesystem, partly initialized - by the specific filesystem code + by the specific filesystem code int flags: mount flags @@ -180,12 +180,12 @@ and provides a fill_super() callback instead. The generic variants are: mount_nodev: mount a filesystem that is not backed by a device mount_single: mount a filesystem which shares the instance between - all mounts + all mounts A fill_super() callback implementation has the following arguments: struct super_block *sb: the superblock structure. The callback - must initialize this properly. + must initialize this properly. void *data: arbitrary mount options, usually comes as an ASCII string (see "Mount Options" section) @@ -236,14 +236,14 @@ only called from a process context (i.e. not from an interrupt handler or bottom half). alloc_inode: this method is called by alloc_inode() to allocate memory - for struct inode and initialize it. If this function is not - defined, a simple 'struct inode' is allocated. Normally - alloc_inode will be used to allocate a larger structure which - contains a 'struct inode' embedded within it. + for struct inode and initialize it. If this function is not + defined, a simple 'struct inode' is allocated. Normally + alloc_inode will be used to allocate a larger structure which + contains a 'struct inode' embedded within it. destroy_inode: this method is called by destroy_inode() to release - resources allocated for struct inode. It is only required if - ->alloc_inode was defined and simply undoes anything done by + resources allocated for struct inode. It is only required if + ->alloc_inode was defined and simply undoes anything done by ->alloc_inode. dirty_inode: this method is called by the VFS to mark an inode dirty. @@ -271,15 +271,15 @@ or bottom half). (i.e. unmount). This is called with the superblock lock held sync_fs: called when VFS is writing out all dirty data associated with - a superblock. The second parameter indicates whether the method + a superblock. The second parameter indicates whether the method should wait until the write out has been completed. Optional. freeze_fs: called when VFS is locking a filesystem and - forcing it into a consistent state. This method is currently - used by the Logical Volume Manager (LVM). + forcing it into a consistent state. This method is currently + used by the Logical Volume Manager (LVM). unfreeze_fs: called when VFS is unlocking a filesystem and making it writable - again. + again. statfs: called when the VFS needs to get filesystem statistics. @@ -476,30 +476,30 @@ otherwise noted. that. permission: called by the VFS to check for access rights on a POSIX-like - filesystem. + filesystem. May be called in rcu-walk mode (mask & MAY_NOT_BLOCK). If in rcu-walk - mode, the filesystem must check the permission without blocking or + mode, the filesystem must check the permission without blocking or storing to the inode. If a situation is encountered that rcu-walk cannot handle, return -ECHILD and it will be called again in ref-walk mode. setattr: called by the VFS to set attributes for a file. This method - is called by chmod(2) and related system calls. + is called by chmod(2) and related system calls. getattr: called by the VFS to get attributes of a file. This method - is called by stat(2) and related system calls. + is called by stat(2) and related system calls. listxattr: called by the VFS to list all extended attributes for a given file. This method is called by the listxattr(2) system call. update_time: called by the VFS to update a specific time or the i_version of - an inode. If this is not defined the VFS will update the inode itself - and call mark_inode_dirty_sync. + an inode. If this is not defined the VFS will update the inode itself + and call mark_inode_dirty_sync. atomic_open: called on the last component of an open. Using this optional - method the filesystem can look up, possibly create and open the file in + method the filesystem can look up, possibly create and open the file in one atomic operation. If it wants to leave actual opening to the caller (e.g. if the file turned out to be a symlink, device, or just something filesystem won't do atomic open for), it may signal this by @@ -687,13 +687,13 @@ struct address_space_operations { that all succeeds, ->readpage will be called again. writepages: called by the VM to write out pages associated with the - address_space object. If wbc->sync_mode is WBC_SYNC_ALL, then - the writeback_control will specify a range of pages that must be - written out. If it is WBC_SYNC_NONE, then a nr_to_write is given + address_space object. If wbc->sync_mode is WBC_SYNC_ALL, then + the writeback_control will specify a range of pages that must be + written out. If it is WBC_SYNC_NONE, then a nr_to_write is given and that many pages should be written if possible. If no ->writepages is given, then mpage_writepages is used - instead. This will choose pages from the address space that are - tagged as DIRTY and will pass them to ->writepage. + instead. This will choose pages from the address space that are + tagged as DIRTY and will pass them to ->writepage. set_page_dirty: called by the VM to set a page dirty. This is particularly needed if an address space attaches @@ -704,11 +704,11 @@ struct address_space_operations { PAGECACHE_TAG_DIRTY tag in the radix tree. readpages: called by the VM to read pages associated with the address_space - object. This is essentially just a vector version of - readpage. Instead of just one page, several pages are - requested. + object. This is essentially just a vector version of + readpage. Instead of just one page, several pages are + requested. readpages is only used for read-ahead, so read errors are - ignored. If anything goes wrong, feel free to give up. + ignored. If anything goes wrong, feel free to give up. write_begin: Called by the generic buffered write code to ask the filesystem to @@ -745,12 +745,12 @@ struct address_space_operations { that were able to be copied into pagecache. bmap: called by the VFS to map a logical block offset within object to - physical block number. This method is used by the FIBMAP - ioctl and for working with swap-files. To be able to swap to - a file, the file must have a stable mapping to a block - device. The swap system does not go through the filesystem - but instead uses bmap to find out where the blocks in the file - are and uses those addresses directly. + physical block number. This method is used by the FIBMAP + ioctl and for working with swap-files. To be able to swap to + a file, the file must have a stable mapping to a block + device. The swap system does not go through the filesystem + but instead uses bmap to find out where the blocks in the file + are and uses those addresses directly. invalidatepage: If a page has PagePrivate set, then invalidatepage will be called when part or all of the page is to be removed @@ -810,7 +810,7 @@ struct address_space_operations { putback_page: Called by the VM when isolated page's migration fails. launder_page: Called before freeing a page - it writes back the dirty page. To - prevent redirtying the page, it is kept locked during the whole + prevent redirtying the page, it is kept locked during the whole operation. is_partially_uptodate: Called by the VM when reading a file through the @@ -921,7 +921,7 @@ otherwise noted. unlocked_ioctl: called by the ioctl(2) system call. compat_ioctl: called by the ioctl(2) system call when 32 bit system calls - are used on 64 bit kernels. + are used on 64 bit kernels. mmap: called by the mmap(2) system call @@ -946,7 +946,7 @@ otherwise noted. (non-blocking) mode is enabled for a file lock: called by the fcntl(2) system call for F_GETLK, F_SETLK, and F_SETLKW - commands + commands get_unmapped_area: called by the mmap(2) system call From 4ee33ea403ac7c1f2b04534132ebb9c3c5095b56 Mon Sep 17 00:00:00 2001 From: "Tobin C. Harding" Date: Wed, 15 May 2019 10:29:06 +1000 Subject: [PATCH 012/129] docs: filesystems: vfs: Use uniform space after period. Currently sometimes document has a single space after a period and sometimes it has double. Whichever we use it should be uniform. Use double space after period, be uniform. Tested-by: Randy Dunlap Signed-off-by: Tobin C. Harding Signed-off-by: Jonathan Corbet --- Documentation/filesystems/vfs.txt | 246 +++++++++++++++--------------- 1 file changed, 123 insertions(+), 123 deletions(-) diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index cab5a36f39c6..6088b925aa7f 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt @@ -14,12 +14,12 @@ Introduction The Virtual File System (also known as the Virtual Filesystem Switch) is the software layer in the kernel that provides the filesystem -interface to userspace programs. It also provides an abstraction +interface to userspace programs. It also provides an abstraction within the kernel which allows different filesystem implementations to coexist. VFS system calls open(2), stat(2), read(2), write(2), chmod(2) and so -on are called from a process context. Filesystem locking is described +on are called from a process context. Filesystem locking is described in the document Documentation/filesystems/Locking. @@ -27,37 +27,37 @@ Directory Entry Cache (dcache) ------------------------------ The VFS implements the open(2), stat(2), chmod(2), and similar system -calls. The pathname argument that is passed to them is used by the VFS +calls. The pathname argument that is passed to them is used by the VFS to search through the directory entry cache (also known as the dentry -cache or dcache). This provides a very fast look-up mechanism to -translate a pathname (filename) into a specific dentry. Dentries live +cache or dcache). This provides a very fast look-up mechanism to +translate a pathname (filename) into a specific dentry. Dentries live in RAM and are never saved to disc: they exist only for performance. -The dentry cache is meant to be a view into your entire filespace. As +The dentry cache is meant to be a view into your entire filespace. As most computers cannot fit all dentries in the RAM at the same time, -some bits of the cache are missing. In order to resolve your pathname +some bits of the cache are missing. In order to resolve your pathname into a dentry, the VFS may have to resort to creating dentries along -the way, and then loading the inode. This is done by looking up the +the way, and then loading the inode. This is done by looking up the inode. The Inode Object ---------------- -An individual dentry usually has a pointer to an inode. Inodes are +An individual dentry usually has a pointer to an inode. Inodes are filesystem objects such as regular files, directories, FIFOs and other beasts. They live either on the disc (for block device filesystems) -or in the memory (for pseudo filesystems). Inodes that live on the +or in the memory (for pseudo filesystems). Inodes that live on the disc are copied into the memory when required and changes to the inode -are written back to disc. A single inode can be pointed to by multiple +are written back to disc. A single inode can be pointed to by multiple dentries (hard links, for example, do this). To look up an inode requires that the VFS calls the lookup() method of -the parent directory inode. This method is installed by the specific -filesystem implementation that the inode lives in. Once the VFS has +the parent directory inode. This method is installed by the specific +filesystem implementation that the inode lives in. Once the VFS has the required dentry (and hence the inode), we can do all those boring things like open(2) the file, or stat(2) it to peek at the inode -data. The stat(2) operation is fairly simple: once the VFS has the +data. The stat(2) operation is fairly simple: once the VFS has the dentry, it peeks at the inode data and passes some of it back to userspace. @@ -67,17 +67,17 @@ The File Object Opening a file requires another operation: allocation of a file structure (this is the kernel-side implementation of file -descriptors). The freshly allocated file structure is initialized with +descriptors). The freshly allocated file structure is initialized with a pointer to the dentry and a set of file operation member functions. -These are taken from the inode data. The open() file method is then -called so the specific filesystem implementation can do its work. You -can see that this is another switch performed by the VFS. The file +These are taken from the inode data. The open() file method is then +called so the specific filesystem implementation can do its work. You +can see that this is another switch performed by the VFS. The file structure is placed into the file descriptor table for the process. Reading, writing and closing files (and other assorted VFS operations) is done by using the userspace file descriptor to grab the appropriate file structure, and then calling the required file structure method to -do whatever is required. For as long as the file is open, it keeps the +do whatever is required. For as long as the file is open, it keeps the dentry in use, which in turn means that the VFS inode is still in use. @@ -92,7 +92,7 @@ functions: extern int register_filesystem(struct file_system_type *); extern int unregister_filesystem(struct file_system_type *); -The passed struct file_system_type describes your filesystem. When a +The passed struct file_system_type describes your filesystem. When a request is made to mount a filesystem onto a directory in your namespace, the VFS will call the appropriate mount() method for the specific filesystem. New vfsmount referring to the tree returned by ->mount() @@ -106,7 +106,7 @@ file /proc/filesystems. struct file_system_type ----------------------- -This describes the filesystem. As of kernel 2.6.39, the following +This describes the filesystem. As of kernel 2.6.39, the following members are defined: struct file_system_type { @@ -168,12 +168,12 @@ point of view is a reference to dentry at the root of (sub)tree to be attached; creation of new superblock is a common side effect. The most interesting member of the superblock structure that the -mount() method fills in is the "s_op" field. This is a pointer to +mount() method fills in is the "s_op" field. This is a pointer to a "struct super_operations" which describes the next level of the filesystem implementation. Usually, a filesystem uses one of the generic mount() implementations -and provides a fill_super() callback instead. The generic variants are: +and provides a fill_super() callback instead. The generic variants are: mount_bdev: mount a filesystem residing on a block device @@ -184,7 +184,7 @@ and provides a fill_super() callback instead. The generic variants are: A fill_super() callback implementation has the following arguments: - struct super_block *sb: the superblock structure. The callback + struct super_block *sb: the superblock structure. The callback must initialize this properly. void *data: arbitrary mount options, usually comes as an ASCII @@ -203,7 +203,7 @@ struct super_operations ----------------------- This describes how the VFS can manipulate the superblock of your -filesystem. As of kernel 2.6.22, the following members are defined: +filesystem. As of kernel 2.6.22, the following members are defined: struct super_operations { struct inode *(*alloc_inode)(struct super_block *sb); @@ -231,7 +231,7 @@ struct super_operations { }; All methods are called without any locks being held, unless otherwise -noted. This means that most methods can block safely. All methods are +noted. This means that most methods can block safely. All methods are only called from a process context (i.e. not from an interrupt handler or bottom half). @@ -268,11 +268,11 @@ or bottom half). delete_inode: called when the VFS wants to delete an inode put_super: called when the VFS wishes to free the superblock - (i.e. unmount). This is called with the superblock lock held + (i.e. unmount). This is called with the superblock lock held sync_fs: called when VFS is writing out all dirty data associated with - a superblock. The second parameter indicates whether the method - should wait until the write out has been completed. Optional. + a superblock. The second parameter indicates whether the method + should wait until the write out has been completed. Optional. freeze_fs: called when VFS is locking a filesystem and forcing it into a consistent state. This method is currently @@ -283,10 +283,10 @@ or bottom half). statfs: called when the VFS needs to get filesystem statistics. - remount_fs: called when the filesystem is remounted. This is called + remount_fs: called when the filesystem is remounted. This is called with the kernel lock held - clear_inode: called then the VFS clears the inode. Optional + clear_inode: called then the VFS clears the inode. Optional umount_begin: called when the VFS is unmounting a filesystem. @@ -307,17 +307,17 @@ or bottom half). implement ->nr_cached_objects for it to be called correctly. We can't do anything with any errors that the filesystem might - encountered, hence the void return type. This will never be called if + encountered, hence the void return type. This will never be called if the VM is trying to reclaim under GFP_NOFS conditions, hence this method does not need to handle that situation itself. Implementations must include conditional reschedule calls inside any - scanning loop that is done. This allows the VFS to determine + scanning loop that is done. This allows the VFS to determine appropriate scan batch sizes without having to worry about whether implementations will cause holdoff problems due to large scan batch sizes. -Whoever sets up the inode is responsible for filling in the "i_op" field. This +Whoever sets up the inode is responsible for filling in the "i_op" field. This is a pointer to a "struct inode_operations" which describes the methods that can be performed on individual inodes. @@ -361,7 +361,7 @@ struct inode_operations ----------------------- This describes how the VFS can manipulate an inode in your -filesystem. As of kernel 2.6.22, the following members are defined: +filesystem. As of kernel 2.6.22, the following members are defined: struct inode_operations { int (*create) (struct inode *,struct dentry *, umode_t, bool); @@ -391,19 +391,19 @@ struct inode_operations { Again, all methods are called without any locks being held, unless otherwise noted. - create: called by the open(2) and creat(2) system calls. Only - required if you want to support regular files. The dentry you + create: called by the open(2) and creat(2) system calls. Only + required if you want to support regular files. The dentry you get should not have an inode (i.e. it should be a negative - dentry). Here you will probably call d_instantiate() with the + dentry). Here you will probably call d_instantiate() with the dentry and the newly created inode lookup: called when the VFS needs to look up an inode in a parent - directory. The name to look for is found in the dentry. This + directory. The name to look for is found in the dentry. This method must call d_add() to insert the found inode into the - dentry. The "i_count" field in the inode structure should be - incremented. If the named inode does not exist a NULL inode + dentry. The "i_count" field in the inode structure should be + incremented. If the named inode does not exist a NULL inode should be inserted into the dentry (this is called a negative - dentry). Returning an error code from this routine must only + dentry). Returning an error code from this routine must only be done on a real error, otherwise creating inodes with system calls like create(2), mknod(2), mkdir(2) and so on will fail. If you wish to overload the dentry methods then you should @@ -411,27 +411,27 @@ otherwise noted. to a struct "dentry_operations". This method is called with the directory inode semaphore held - link: called by the link(2) system call. Only required if you want - to support hard links. You will probably need to call + link: called by the link(2) system call. Only required if you want + to support hard links. You will probably need to call d_instantiate() just as you would in the create() method - unlink: called by the unlink(2) system call. Only required if you + unlink: called by the unlink(2) system call. Only required if you want to support deleting inodes - symlink: called by the symlink(2) system call. Only required if you - want to support symlinks. You will probably need to call + symlink: called by the symlink(2) system call. Only required if you + want to support symlinks. You will probably need to call d_instantiate() just as you would in the create() method - mkdir: called by the mkdir(2) system call. Only required if you want - to support creating subdirectories. You will probably need to + mkdir: called by the mkdir(2) system call. Only required if you want + to support creating subdirectories. You will probably need to call d_instantiate() just as you would in the create() method - rmdir: called by the rmdir(2) system call. Only required if you want + rmdir: called by the rmdir(2) system call. Only required if you want to support deleting subdirectories mknod: called by the mknod(2) system call to create a device (char, - block) inode or a named pipe (FIFO) or socket. Only required - if you want to support creating these types of inodes. You + block) inode or a named pipe (FIFO) or socket. Only required + if you want to support creating these types of inodes. You will probably need to call d_instantiate() just as you would in the create() method @@ -478,21 +478,21 @@ otherwise noted. permission: called by the VFS to check for access rights on a POSIX-like filesystem. - May be called in rcu-walk mode (mask & MAY_NOT_BLOCK). If in rcu-walk - mode, the filesystem must check the permission without blocking or + May be called in rcu-walk mode (mask & MAY_NOT_BLOCK). If in rcu-walk + mode, the filesystem must check the permission without blocking or storing to the inode. If a situation is encountered that rcu-walk cannot handle, return -ECHILD and it will be called again in ref-walk mode. - setattr: called by the VFS to set attributes for a file. This method + setattr: called by the VFS to set attributes for a file. This method is called by chmod(2) and related system calls. - getattr: called by the VFS to get attributes of a file. This method + getattr: called by the VFS to get attributes of a file. This method is called by stat(2) and related system calls. listxattr: called by the VFS to list all extended attributes for a - given file. This method is called by the listxattr(2) system call. + given file. This method is called by the listxattr(2) system call. update_time: called by the VFS to update a specific time or the i_version of an inode. If this is not defined the VFS will update the inode itself @@ -530,7 +530,7 @@ The first can be used independently to the others. The VM can try to either write dirty pages in order to clean them, or release clean pages in order to reuse them. To do this it can call the ->writepage method on dirty pages, and ->releasepage on clean pages with -PagePrivate set. Clean pages without PagePrivate and with no external +PagePrivate set. Clean pages without PagePrivate and with no external references will be released without notice being given to the address_space. @@ -538,7 +538,7 @@ To achieve this functionality, pages need to be placed on an LRU with lru_cache_add and mark_page_active needs to be called whenever the page is used. -Pages are normally kept in a radix tree index by ->index. This tree +Pages are normally kept in a radix tree index by ->index. This tree maintains information about the PG_Dirty and PG_Writeback status of each page, so that pages with either of these flags can be found quickly. @@ -624,7 +624,7 @@ struct address_space_operations ------------------------------- This describes how the VFS can manipulate mapping of a file to page cache in -your filesystem. The following members are defined: +your filesystem. The following members are defined: struct address_space_operations { int (*writepage)(struct page *page, struct writeback_control *wbc); @@ -704,7 +704,7 @@ struct address_space_operations { PAGECACHE_TAG_DIRTY tag in the radix tree. readpages: called by the VM to read pages associated with the address_space - object. This is essentially just a vector version of + object. This is essentially just a vector version of readpage. Instead of just one page, several pages are requested. readpages is only used for read-ahead, so read errors are @@ -712,7 +712,7 @@ struct address_space_operations { write_begin: Called by the generic buffered write code to ask the filesystem to - prepare to write len bytes at the given offset in the file. The + prepare to write len bytes at the given offset in the file. The address_space should check that the write will be able to complete, by allocating space if necessary and doing any other internal housekeeping. If the write will update parts of any basic-blocks on @@ -735,7 +735,7 @@ struct address_space_operations { which case write_end is not called. write_end: After a successful write_begin, and data copy, write_end must - be called. len is the original len passed to write_begin, and copied + be called. len is the original len passed to write_begin, and copied is the amount that was able to be copied. The filesystem must take care of unlocking the page and releasing it @@ -745,7 +745,7 @@ struct address_space_operations { that were able to be copied into pagecache. bmap: called by the VFS to map a logical block offset within object to - physical block number. This method is used by the FIBMAP + physical block number. This method is used by the FIBMAP ioctl and for working with swap-files. To be able to swap to a file, the file must have a stable mapping to a block device. The swap system does not go through the filesystem @@ -757,7 +757,7 @@ struct address_space_operations { from the address space. This generally corresponds to either a truncation, punch hole or a complete invalidation of the address space (in the latter case 'offset' will always be 0 and 'length' - will be PAGE_SIZE). Any private data associated with the page + will be PAGE_SIZE). Any private data associated with the page should be updated to reflect this truncation. If offset is 0 and length is PAGE_SIZE, then the private data should be released, because the page must be able to be completely discarded. This may @@ -767,7 +767,7 @@ struct address_space_operations { releasepage: releasepage is called on PagePrivate pages to indicate that the page should be freed if possible. ->releasepage should remove any private data from the page and clear the - PagePrivate flag. If releasepage() fails for some reason, it must + PagePrivate flag. If releasepage() fails for some reason, it must indicate failure with a 0 return value. releasepage() is used in two distinct though related cases. The first is when the VM finds a clean page with no active users and @@ -787,7 +787,7 @@ struct address_space_operations { freepage: freepage is called once the page is no longer visible in the page cache in order to allow the cleanup of any private - data. Since it may be called by the memory reclaimer, it + data. Since it may be called by the memory reclaimer, it should not assume that the original address_space mapping still exists, and it should not block. @@ -809,32 +809,32 @@ struct address_space_operations { putback_page: Called by the VM when isolated page's migration fails. - launder_page: Called before freeing a page - it writes back the dirty page. To + launder_page: Called before freeing a page - it writes back the dirty page. To prevent redirtying the page, it is kept locked during the whole operation. is_partially_uptodate: Called by the VM when reading a file through the - pagecache when the underlying blocksize != pagesize. If the required + pagecache when the underlying blocksize != pagesize. If the required block is up to date then the read can complete without needing the IO to bring the whole page up to date. is_dirty_writeback: Called by the VM when attempting to reclaim a page. The VM uses dirty and writeback information to determine if it needs - to stall to allow flushers a chance to complete some IO. Ordinarily + to stall to allow flushers a chance to complete some IO. Ordinarily it can use PageDirty and PageWriteback but some filesystems have more complex state (unstable pages in NFS prevent reclaim) or - do not set those flags due to locking problems. This callback + do not set those flags due to locking problems. This callback allows a filesystem to indicate to the VM if a page should be treated as dirty or writeback for the purposes of stalling. error_remove_page: normally set to generic_error_remove_page if truncation - is ok for this address space. Used for memory failure handling. + is ok for this address space. Used for memory failure handling. Setting this implies you deal with pages going away under you, unless you have them locked or reference counts increased. swap_activate: Called when swapon is used on a file to allocate space if necessary and pin the block lookup information in - memory. A return value of zero indicates success, + memory. A return value of zero indicates success, in which case this file can be used to back swapspace. swap_deactivate: Called during swapoff on files where swap_activate @@ -844,14 +844,14 @@ struct address_space_operations { The File Object =============== -A file object represents a file opened by a process. This is also known +A file object represents a file opened by a process. This is also known as an "open file description" in POSIX parlance. struct file_operations ---------------------- -This describes how the VFS can manipulate an open file. As of kernel +This describes how the VFS can manipulate an open file. As of kernel 4.18, the following members are defined: struct file_operations { @@ -916,7 +916,7 @@ otherwise noted. poll: called by the VFS when a process wants to check if there is activity on this file and (optionally) go to sleep until there - is activity. Called by the select(2) and poll(2) system calls + is activity. Called by the select(2) and poll(2) system calls unlocked_ioctl: called by the ioctl(2) system call. @@ -925,13 +925,13 @@ otherwise noted. mmap: called by the mmap(2) system call - open: called by the VFS when an inode should be opened. When the VFS - opens a file, it creates a new "struct file". It then calls the - open method for the newly allocated file structure. You might + open: called by the VFS when an inode should be opened. When the VFS + opens a file, it creates a new "struct file". It then calls the + open method for the newly allocated file structure. You might think that the open method really belongs in - "struct inode_operations", and you may be right. I think it's + "struct inode_operations", and you may be right. I think it's done the way it is because it makes filesystems simpler to - implement. The open() method is a good place to initialize the + implement. The open() method is a good place to initialize the "private_data" member in the file structure if you want to point to a device structure @@ -939,7 +939,7 @@ otherwise noted. release: called when the last reference to an open file is closed - fsync: called by the fsync(2) system call. Also see the section above + fsync: called by the fsync(2) system call. Also see the section above entitled "Handling errors during writeback". fasync: called by the fcntl(2) system call when asynchronous @@ -954,13 +954,13 @@ otherwise noted. flock: called by the flock(2) system call - splice_write: called by the VFS to splice data from a pipe to a file. This + splice_write: called by the VFS to splice data from a pipe to a file. This method is used by the splice(2) system call - splice_read: called by the VFS to splice data from file to a pipe. This + splice_read: called by the VFS to splice data from file to a pipe. This method is used by the splice(2) system call - setlease: called by the VFS to set or release a file lock lease. setlease + setlease: called by the VFS to set or release a file lock lease. setlease implementations should call generic_setlease to record or remove the lease in the inode after setting it. @@ -984,12 +984,12 @@ otherwise noted. fadvise: possibly called by the fadvise64() system call. Note that the file operations are implemented by the specific -filesystem in which the inode resides. When opening a device node +filesystem in which the inode resides. When opening a device node (character or block special) most filesystems will call special support routines in the VFS which will locate the required device -driver information. These support routines replace the filesystem file +driver information. These support routines replace the filesystem file operations with those for the device driver, and then proceed to call -the new open() method for the file. This is how opening a device file +the new open() method for the file. This is how opening a device file in the filesystem eventually ends up calling the device driver open() method. @@ -1002,10 +1002,10 @@ struct dentry_operations ------------------------ This describes how a filesystem can overload the standard dentry -operations. Dentries and the dcache are the domain of the VFS and the -individual filesystem implementations. Device drivers have no business -here. These methods may be set to NULL, as they are either optional or -the VFS uses a default. As of kernel 2.6.22, the following members are +operations. Dentries and the dcache are the domain of the VFS and the +individual filesystem implementations. Device drivers have no business +here. These methods may be set to NULL, as they are either optional or +the VFS uses a default. As of kernel 2.6.22, the following members are defined: struct dentry_operations { @@ -1024,10 +1024,10 @@ struct dentry_operations { struct dentry *(*d_real)(struct dentry *, const struct inode *); }; - d_revalidate: called when the VFS needs to revalidate a dentry. This + d_revalidate: called when the VFS needs to revalidate a dentry. This is called whenever a name look-up finds a dentry in the - dcache. Most local filesystems leave this as NULL, because all their - dentries in the dcache are valid. Network filesystems are different + dcache. Most local filesystems leave this as NULL, because all their + dentries in the dcache are valid. Network filesystems are different since things can change on the server without the client necessarily being aware of it. @@ -1045,11 +1045,11 @@ struct dentry_operations { d_weak_revalidate: called when the VFS needs to revalidate a "jumped" dentry. This is called when a path-walk ends at dentry that was not acquired by - doing a lookup in the parent directory. This includes "/", "." and "..", + doing a lookup in the parent directory. This includes "/", "." and "..", as well as procfs-style symlinks and mountpoint traversal. In this case, we are less concerned with whether the dentry is still - fully correct, but rather that the inode is still valid. As with + fully correct, but rather that the inode is still valid. As with d_revalidate, most local filesystems will set this to NULL since their dcache entries are always valid. @@ -1057,17 +1057,17 @@ struct dentry_operations { d_weak_revalidate is only called after leaving rcu-walk mode. - d_hash: called when the VFS adds a dentry to the hash table. The first + d_hash: called when the VFS adds a dentry to the hash table. The first dentry passed to d_hash is the parent directory that the name is to be hashed into. Same locking and synchronisation rules as d_compare regarding what is safe to dereference etc. - d_compare: called to compare a dentry name with a given name. The first + d_compare: called to compare a dentry name with a given name. The first dentry is the parent of the dentry to be compared, the second is - the child dentry. len and name string are properties of the dentry - to be compared. qstr is the name to compare it with. + the child dentry. len and name string are properties of the dentry + to be compared. qstr is the name to compare it with. Must be constant and idempotent, and should not take locks if possible, and should not or store into the dentry. @@ -1082,9 +1082,9 @@ struct dentry_operations { "rcu-walk", ie. without any locks or references on things. d_delete: called when the last reference to a dentry is dropped and the - dcache is deciding whether or not to cache it. Return 1 to delete - immediately, or 0 to cache the dentry. Default is NULL which means to - always cache a reachable dentry. d_delete must be constant and + dcache is deciding whether or not to cache it. Return 1 to delete + immediately, or 0 to cache the dentry. Default is NULL which means to + always cache a reachable dentry. d_delete must be constant and idempotent. d_init: called when a dentry is allocated @@ -1092,19 +1092,19 @@ struct dentry_operations { d_release: called when a dentry is really deallocated d_iput: called when a dentry loses its inode (just prior to its - being deallocated). The default when this is NULL is that the - VFS calls iput(). If you define this method, you must call + being deallocated). The default when this is NULL is that the + VFS calls iput(). If you define this method, you must call iput() yourself d_dname: called when the pathname of a dentry should be generated. Useful for some pseudo filesystems (sockfs, pipefs, ...) to delay - pathname generation. (Instead of doing it when dentry is created, - it's done only when the path is needed.). Real filesystems probably + pathname generation. (Instead of doing it when dentry is created, + it's done only when the path is needed.). Real filesystems probably dont want to use it, because their dentries are present in global - dcache hash, so their hash should be an invariant. As no lock is + dcache hash, so their hash should be an invariant. As no lock is held, d_dname() should not try to modify the dentry itself, unless - appropriate SMP safety is used. CAUTION : d_path() logic is quite - tricky. The correct way to return for example "Hello" is to put it + appropriate SMP safety is used. CAUTION : d_path() logic is quite + tricky. The correct way to return for example "Hello" is to put it at the end of the buffer, and returns a pointer to the first char. dynamic_dname() helper function is provided to take care of this. @@ -1166,7 +1166,7 @@ struct dentry_operations { With NULL inode the topmost real underlying dentry is returned. Each dentry has a pointer to its parent dentry, as well as a hash list -of child dentries. Child dentries are basically like files in a +of child dentries. Child dentries are basically like files in a directory. @@ -1179,36 +1179,36 @@ manipulate dentries: dget: open a new handle for an existing dentry (this just increments the usage count) - dput: close a handle for a dentry (decrements the usage count). If + dput: close a handle for a dentry (decrements the usage count). If the usage count drops to 0, and the dentry is still in its parent's hash, the "d_delete" method is called to check whether - it should be cached. If it should not be cached, or if the dentry - is not hashed, it is deleted. Otherwise cached dentries are put + it should be cached. If it should not be cached, or if the dentry + is not hashed, it is deleted. Otherwise cached dentries are put into an LRU list to be reclaimed on memory shortage. - d_drop: this unhashes a dentry from its parents hash list. A + d_drop: this unhashes a dentry from its parents hash list. A subsequent call to dput() will deallocate the dentry if its usage count drops to 0 - d_delete: delete a dentry. If there are no other open references to + d_delete: delete a dentry. If there are no other open references to the dentry then the dentry is turned into a negative dentry - (the d_iput() method is called). If there are other + (the d_iput() method is called). If there are other references, then d_drop() is called instead d_add: add a dentry to its parents hash list and then calls d_instantiate() d_instantiate: add a dentry to the alias hash list for the inode and - updates the "d_inode" member. The "i_count" member in the - inode structure should be set/incremented. If the inode + updates the "d_inode" member. The "i_count" member in the + inode structure should be set/incremented. If the inode pointer is NULL, the dentry is called a "negative - dentry". This function is commonly called when an inode is + dentry". This function is commonly called when an inode is created for an existing negative dentry d_lookup: look up a dentry given its parent and path name component It looks up the child of that given name from the dcache - hash table. If it is found, the reference count is incremented - and the dentry is returned. The caller must use dput() + hash table. If it is found, the reference count is incremented + and the dentry is returned. The caller must use dput() to free the dentry when it finishes using it. Mount Options From 90caa781f6402a08b4e602fab7017baa3cee3a28 Mon Sep 17 00:00:00 2001 From: "Tobin C. Harding" Date: Wed, 15 May 2019 10:29:07 +1000 Subject: [PATCH 013/129] docs: filesystems: vfs: Use 72 character column width In preparation for conversion to RST format use the kernels favoured documentation column width. If we are going to do this we might as well do it thoroughly. Just do the paragraphs (not the indented stuff), the rest will be done during indentation fix up patch. This patch is whitespace only, no textual changes. Use 72 character column width for all paragraph sections. Tested-by: Randy Dunlap Signed-off-by: Tobin C. Harding Signed-off-by: Jonathan Corbet --- Documentation/filesystems/vfs.txt | 198 +++++++++++++++--------------- 1 file changed, 97 insertions(+), 101 deletions(-) diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index 6088b925aa7f..1cd0e658137a 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt @@ -12,15 +12,14 @@ Introduction ============ -The Virtual File System (also known as the Virtual Filesystem Switch) -is the software layer in the kernel that provides the filesystem -interface to userspace programs. It also provides an abstraction -within the kernel which allows different filesystem implementations to -coexist. +The Virtual File System (also known as the Virtual Filesystem Switch) is +the software layer in the kernel that provides the filesystem interface +to userspace programs. It also provides an abstraction within the +kernel which allows different filesystem implementations to coexist. -VFS system calls open(2), stat(2), read(2), write(2), chmod(2) and so -on are called from a process context. Filesystem locking is described -in the document Documentation/filesystems/Locking. +VFS system calls open(2), stat(2), read(2), write(2), chmod(2) and so on +are called from a process context. Filesystem locking is described in +the document Documentation/filesystems/Locking. Directory Entry Cache (dcache) @@ -34,11 +33,10 @@ translate a pathname (filename) into a specific dentry. Dentries live in RAM and are never saved to disc: they exist only for performance. The dentry cache is meant to be a view into your entire filespace. As -most computers cannot fit all dentries in the RAM at the same time, -some bits of the cache are missing. In order to resolve your pathname -into a dentry, the VFS may have to resort to creating dentries along -the way, and then loading the inode. This is done by looking up the -inode. +most computers cannot fit all dentries in the RAM at the same time, some +bits of the cache are missing. In order to resolve your pathname into a +dentry, the VFS may have to resort to creating dentries along the way, +and then loading the inode. This is done by looking up the inode. The Inode Object @@ -46,33 +44,32 @@ The Inode Object An individual dentry usually has a pointer to an inode. Inodes are filesystem objects such as regular files, directories, FIFOs and other -beasts. They live either on the disc (for block device filesystems) -or in the memory (for pseudo filesystems). Inodes that live on the -disc are copied into the memory when required and changes to the inode -are written back to disc. A single inode can be pointed to by multiple +beasts. They live either on the disc (for block device filesystems) or +in the memory (for pseudo filesystems). Inodes that live on the disc +are copied into the memory when required and changes to the inode are +written back to disc. A single inode can be pointed to by multiple dentries (hard links, for example, do this). To look up an inode requires that the VFS calls the lookup() method of the parent directory inode. This method is installed by the specific -filesystem implementation that the inode lives in. Once the VFS has -the required dentry (and hence the inode), we can do all those boring -things like open(2) the file, or stat(2) it to peek at the inode -data. The stat(2) operation is fairly simple: once the VFS has the -dentry, it peeks at the inode data and passes some of it back to -userspace. +filesystem implementation that the inode lives in. Once the VFS has the +required dentry (and hence the inode), we can do all those boring things +like open(2) the file, or stat(2) it to peek at the inode data. The +stat(2) operation is fairly simple: once the VFS has the dentry, it +peeks at the inode data and passes some of it back to userspace. The File Object --------------- Opening a file requires another operation: allocation of a file -structure (this is the kernel-side implementation of file -descriptors). The freshly allocated file structure is initialized with -a pointer to the dentry and a set of file operation member functions. -These are taken from the inode data. The open() file method is then -called so the specific filesystem implementation can do its work. You -can see that this is another switch performed by the VFS. The file -structure is placed into the file descriptor table for the process. +structure (this is the kernel-side implementation of file descriptors). +The freshly allocated file structure is initialized with a pointer to +the dentry and a set of file operation member functions. These are +taken from the inode data. The open() file method is then called so the +specific filesystem implementation can do its work. You can see that +this is another switch performed by the VFS. The file structure is +placed into the file descriptor table for the process. Reading, writing and closing files (and other assorted VFS operations) is done by using the userspace file descriptor to grab the appropriate @@ -93,11 +90,12 @@ functions: extern int unregister_filesystem(struct file_system_type *); The passed struct file_system_type describes your filesystem. When a -request is made to mount a filesystem onto a directory in your namespace, -the VFS will call the appropriate mount() method for the specific -filesystem. New vfsmount referring to the tree returned by ->mount() -will be attached to the mountpoint, so that when pathname resolution -reaches the mountpoint it will jump into the root of that vfsmount. +request is made to mount a filesystem onto a directory in your +namespace, the VFS will call the appropriate mount() method for the +specific filesystem. New vfsmount referring to the tree returned by +->mount() will be attached to the mountpoint, so that when pathname +resolution reaches the mountpoint it will jump into the root of that +vfsmount. You can see all filesystems that are registered to the kernel in the file /proc/filesystems. @@ -156,21 +154,21 @@ The mount() method must return the root dentry of the tree requested by caller. An active reference to its superblock must be grabbed and the superblock must be locked. On failure it should return ERR_PTR(error). -The arguments match those of mount(2) and their interpretation -depends on filesystem type. E.g. for block filesystems, dev_name is -interpreted as block device name, that device is opened and if it -contains a suitable filesystem image the method creates and initializes -struct super_block accordingly, returning its root dentry to caller. +The arguments match those of mount(2) and their interpretation depends +on filesystem type. E.g. for block filesystems, dev_name is interpreted +as block device name, that device is opened and if it contains a +suitable filesystem image the method creates and initializes struct +super_block accordingly, returning its root dentry to caller. ->mount() may choose to return a subtree of existing filesystem - it doesn't have to create a new one. The main result from the caller's -point of view is a reference to dentry at the root of (sub)tree to -be attached; creation of new superblock is a common side effect. +point of view is a reference to dentry at the root of (sub)tree to be +attached; creation of new superblock is a common side effect. -The most interesting member of the superblock structure that the -mount() method fills in is the "s_op" field. This is a pointer to -a "struct super_operations" which describes the next level of the -filesystem implementation. +The most interesting member of the superblock structure that the mount() +method fills in is the "s_op" field. This is a pointer to a "struct +super_operations" which describes the next level of the filesystem +implementation. Usually, a filesystem uses one of the generic mount() implementations and provides a fill_super() callback instead. The generic variants are: @@ -317,16 +315,16 @@ or bottom half). implementations will cause holdoff problems due to large scan batch sizes. -Whoever sets up the inode is responsible for filling in the "i_op" field. This -is a pointer to a "struct inode_operations" which describes the methods that -can be performed on individual inodes. +Whoever sets up the inode is responsible for filling in the "i_op" +field. This is a pointer to a "struct inode_operations" which describes +the methods that can be performed on individual inodes. struct xattr_handlers --------------------- On filesystems that support extended attributes (xattrs), the s_xattr -superblock field points to a NULL-terminated array of xattr handlers. Extended -attributes are name:value pairs. +superblock field points to a NULL-terminated array of xattr handlers. +Extended attributes are name:value pairs. name: Indicates that the handler matches attributes with the specified name (such as "system.posix_acl_access"); the prefix field must be NULL. @@ -346,9 +344,9 @@ attributes are name:value pairs. attribute. This method is called by the the setxattr(2) and removexattr(2) system calls. -When none of the xattr handlers of a filesystem match the specified attribute -name or when a filesystem doesn't support extended attributes, the various -*xattr(2) system calls return -EOPNOTSUPP. +When none of the xattr handlers of a filesystem match the specified +attribute name or when a filesystem doesn't support extended attributes, +the various *xattr(2) system calls return -EOPNOTSUPP. The Inode Object @@ -360,8 +358,8 @@ An inode object represents an object within the filesystem. struct inode_operations ----------------------- -This describes how the VFS can manipulate an inode in your -filesystem. As of kernel 2.6.22, the following members are defined: +This describes how the VFS can manipulate an inode in your filesystem. +As of kernel 2.6.22, the following members are defined: struct inode_operations { int (*create) (struct inode *,struct dentry *, umode_t, bool); @@ -517,42 +515,40 @@ The Address Space Object ======================== The address space object is used to group and manage pages in the page -cache. It can be used to keep track of the pages in a file (or -anything else) and also track the mapping of sections of the file into -process address spaces. +cache. It can be used to keep track of the pages in a file (or anything +else) and also track the mapping of sections of the file into process +address spaces. There are a number of distinct yet related services that an -address-space can provide. These include communicating memory -pressure, page lookup by address, and keeping track of pages tagged as -Dirty or Writeback. +address-space can provide. These include communicating memory pressure, +page lookup by address, and keeping track of pages tagged as Dirty or +Writeback. The first can be used independently to the others. The VM can try to -either write dirty pages in order to clean them, or release clean -pages in order to reuse them. To do this it can call the ->writepage -method on dirty pages, and ->releasepage on clean pages with -PagePrivate set. Clean pages without PagePrivate and with no external -references will be released without notice being given to the -address_space. +either write dirty pages in order to clean them, or release clean pages +in order to reuse them. To do this it can call the ->writepage method +on dirty pages, and ->releasepage on clean pages with PagePrivate set. +Clean pages without PagePrivate and with no external references will be +released without notice being given to the address_space. To achieve this functionality, pages need to be placed on an LRU with -lru_cache_add and mark_page_active needs to be called whenever the -page is used. +lru_cache_add and mark_page_active needs to be called whenever the page +is used. Pages are normally kept in a radix tree index by ->index. This tree -maintains information about the PG_Dirty and PG_Writeback status of -each page, so that pages with either of these flags can be found -quickly. +maintains information about the PG_Dirty and PG_Writeback status of each +page, so that pages with either of these flags can be found quickly. The Dirty tag is primarily used by mpage_writepages - the default ->writepages method. It uses the tag to find dirty pages to call ->writepage on. If mpage_writepages is not used (i.e. the address -provides its own ->writepages) , the PAGECACHE_TAG_DIRTY tag is -almost unused. write_inode_now and sync_inode do use it (through +provides its own ->writepages) , the PAGECACHE_TAG_DIRTY tag is almost +unused. write_inode_now and sync_inode do use it (through __sync_single_inode) to check if ->writepages has been successful in writing out the whole address_space. -The Writeback tag is used by filemap*wait* and sync_page* functions, -via filemap_fdatawait_range, to wait for all writeback to complete. +The Writeback tag is used by filemap*wait* and sync_page* functions, via +filemap_fdatawait_range, to wait for all writeback to complete. An address_space handler may attach extra information to a page, typically using the 'private' field in the 'struct page'. If such @@ -562,25 +558,24 @@ handler to deal with that data. An address space acts as an intermediate between storage and application. Data is read into the address space a whole page at a -time, and provided to the application either by copying of the page, -or by memory-mapping the page. -Data is written into the address space by the application, and then -written-back to storage typically in whole pages, however the -address_space has finer control of write sizes. +time, and provided to the application either by copying of the page, or +by memory-mapping the page. Data is written into the address space by +the application, and then written-back to storage typically in whole +pages, however the address_space has finer control of write sizes. The read process essentially only requires 'readpage'. The write process is more complicated and uses write_begin/write_end or -set_page_dirty to write data into the address_space, and writepage -and writepages to writeback data to storage. +set_page_dirty to write data into the address_space, and writepage and +writepages to writeback data to storage. Adding and removing pages to/from an address_space is protected by the inode's i_mutex. When data is written to a page, the PG_Dirty flag should be set. It typically remains set until writepage asks for it to be written. This -should clear PG_Dirty and set PG_Writeback. It can be actually -written at any point after PG_Dirty is clear. Once it is known to be -safe, PG_Writeback is cleared. +should clear PG_Dirty and set PG_Writeback. It can be actually written +at any point after PG_Dirty is clear. Once it is known to be safe, +PG_Writeback is cleared. Writeback makes use of a writeback_control structure to direct the operations. This gives the the writepage and writepages operations some @@ -609,9 +604,10 @@ file descriptors should get back an error is not possible. Instead, the generic writeback error tracking infrastructure in the kernel settles for reporting errors to fsync on all file descriptions that were open at the time that the error occurred. In a situation with -multiple writers, all of them will get back an error on a subsequent fsync, -even if all of the writes done through that particular file descriptor -succeeded (or even if there were no writes on that file descriptor at all). +multiple writers, all of them will get back an error on a subsequent +fsync, even if all of the writes done through that particular file +descriptor succeeded (or even if there were no writes on that file +descriptor at all). Filesystems that wish to use this infrastructure should call mapping_set_error to record the error in the address_space when it @@ -623,8 +619,8 @@ point in the stream of errors emitted by the backing device(s). struct address_space_operations ------------------------------- -This describes how the VFS can manipulate mapping of a file to page cache in -your filesystem. The following members are defined: +This describes how the VFS can manipulate mapping of a file to page +cache in your filesystem. The following members are defined: struct address_space_operations { int (*writepage)(struct page *page, struct writeback_control *wbc); @@ -1231,8 +1227,8 @@ filesystems. Showing options --------------- -If a filesystem accepts mount options, it must define show_options() -to show all the currently active options. The rules are: +If a filesystem accepts mount options, it must define show_options() to +show all the currently active options. The rules are: - options MUST be shown which are not default or their values differ from the default @@ -1240,14 +1236,14 @@ to show all the currently active options. The rules are: - options MAY be shown which are enabled by default or have their default value -Options used only internally between a mount helper and the kernel -(such as file descriptors), or which only have an effect during the -mounting (such as ones controlling the creation of a journal) are exempt -from the above rules. +Options used only internally between a mount helper and the kernel (such +as file descriptors), or which only have an effect during the mounting +(such as ones controlling the creation of a journal) are exempt from the +above rules. -The underlying reason for the above rules is to make sure, that a -mount can be accurately replicated (e.g. umounting and mounting again) -based on the information found in /proc/mounts. +The underlying reason for the above rules is to make sure, that a mount +can be accurately replicated (e.g. umounting and mounting again) based +on the information found in /proc/mounts. Resources ========= From e04c83cd53b59e422157c4cea0cdc4e2f33fe305 Mon Sep 17 00:00:00 2001 From: "Tobin C. Harding" Date: Wed, 15 May 2019 10:29:08 +1000 Subject: [PATCH 014/129] docs: filesystems: vfs: Use uniform spacing around headings Currently spacing before and after headings is non-uniform. Use two blank lines before a heading and one after the heading. Use uniform spacing around headings. Tested-by: Randy Dunlap Signed-off-by: Tobin C. Harding Signed-off-by: Jonathan Corbet --- Documentation/filesystems/vfs.txt | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index 1cd0e658137a..242fd644c97b 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt @@ -319,6 +319,7 @@ Whoever sets up the inode is responsible for filling in the "i_op" field. This is a pointer to a "struct inode_operations" which describes the methods that can be performed on individual inodes. + struct xattr_handlers --------------------- @@ -511,6 +512,7 @@ otherwise noted. tmpfile: called in the end of O_TMPFILE open(). Optional, equivalent to atomically creating, opening and unlinking a file in given directory. + The Address Space Object ======================== @@ -584,8 +586,10 @@ and the constraints under which it is being done. It is also used to return information back to the caller about the result of a writepage or writepages request. + Handling errors during writeback -------------------------------- + Most applications that do buffered I/O will periodically call a file synchronization call (fsync, fdatasync, msync or sync_file_range) to ensure that data written has made it to the backing store. When there @@ -616,6 +620,7 @@ file->fsync operation, they should call file_check_and_advance_wb_err to ensure that the struct file's error cursor has advanced to the correct point in the stream of errors emitted by the backing device(s). + struct address_space_operations ------------------------------- @@ -1207,9 +1212,11 @@ manipulate dentries: and the dentry is returned. The caller must use dput() to free the dentry when it finishes using it. + Mount Options ============= + Parsing options --------------- @@ -1224,6 +1231,7 @@ The header defines an API that helps parse these options. There are plenty of examples on how to use it in existing filesystems. + Showing options --------------- @@ -1245,6 +1253,7 @@ The underlying reason for the above rules is to make sure, that a mount can be accurately replicated (e.g. umounting and mounting again) based on the information found in /proc/mounts. + Resources ========= From 90ac11a844f8859d5f960fb530190a9690a9a19b Mon Sep 17 00:00:00 2001 From: "Tobin C. Harding" Date: Wed, 15 May 2019 10:29:09 +1000 Subject: [PATCH 015/129] docs: filesystems: vfs: Use correct initial heading Kernel RST has a preferred heading adornment scheme. Currently all the heading adornments follow this scheme except the document heading. Use correct heading adornment for initial heading. Tested-by: Randy Dunlap Signed-off-by: Tobin C. Harding Signed-off-by: Jonathan Corbet --- Documentation/filesystems/vfs.txt | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index 242fd644c97b..1167dd94d84b 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt @@ -1,5 +1,6 @@ - - Overview of the Linux Virtual File System +========================================= +Overview of the Linux Virtual File System +========================================= Original author: Richard Gooch From 099c5c7a3fba0c4686090075c6d214355aa67e47 Mon Sep 17 00:00:00 2001 From: "Tobin C. Harding" Date: Wed, 15 May 2019 10:29:10 +1000 Subject: [PATCH 016/129] docs: filesystems: vfs: Use SPDX identifier Currently the licence is indicated via a custom string. We have SPDX license identifiers now for this task. Use SPDX license identifier matching current license string. Tested-by: Randy Dunlap Signed-off-by: Tobin C. Harding Signed-off-by: Jonathan Corbet --- Documentation/filesystems/vfs.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index 1167dd94d84b..bd6dd782e8ca 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt @@ -1,3 +1,5 @@ +.. SPDX-License-Identifier: GPL-2.0 + ========================================= Overview of the Linux Virtual File System ========================================= @@ -7,8 +9,6 @@ Overview of the Linux Virtual File System Copyright (C) 1999 Richard Gooch Copyright (C) 2005 Pekka Enberg - This file is released under the GPLv2. - Introduction ============ From e66b045715457ca6e18fce2b2fc61dd8af2e2440 Mon Sep 17 00:00:00 2001 From: "Tobin C. Harding" Date: Wed, 15 May 2019 10:29:11 +1000 Subject: [PATCH 017/129] docs: filesystems: vfs: Fix pre-amble indentation Currently file pre-amble contains custom indentation. RST is not going to like this, lets left-align the text. Put the copyright notices in a list in preparation for converting document to RST. Tested-by: Randy Dunlap Signed-off-by: Tobin C. Harding Signed-off-by: Jonathan Corbet --- Documentation/filesystems/vfs.txt | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index bd6dd782e8ca..9ed5c8d6e656 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt @@ -4,10 +4,10 @@ Overview of the Linux Virtual File System ========================================= - Original author: Richard Gooch +Original author: Richard Gooch - Copyright (C) 1999 Richard Gooch - Copyright (C) 2005 Pekka Enberg +- Copyright (C) 1999 Richard Gooch +- Copyright (C) 2005 Pekka Enberg Introduction From 1b44ae63deae020e172866871bd14a76376e0f8b Mon Sep 17 00:00:00 2001 From: "Tobin C. Harding" Date: Wed, 15 May 2019 10:29:12 +1000 Subject: [PATCH 018/129] docs: filesystems: vfs: Convert spaces to tabs There are bunch of places with 8 spaces, in preparation for correctly indenting all code snippets (during conversion to RST) change these to use tabspaces. This patch is whitespace only. Convert instances of 8 consecutive spaces to a single tabspace. Signed-off-by: Tobin C. Harding Signed-off-by: Jonathan Corbet --- Documentation/filesystems/vfs.txt | 118 +++++++++++++++--------------- 1 file changed, 59 insertions(+), 59 deletions(-) diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index 9ed5c8d6e656..4f4f4931bfa0 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt @@ -111,12 +111,12 @@ members are defined: struct file_system_type { const char *name; int fs_flags; - struct dentry *(*mount) (struct file_system_type *, int, - const char *, void *); - void (*kill_sb) (struct super_block *); - struct module *owner; - struct file_system_type * next; - struct list_head fs_supers; + struct dentry *(*mount) (struct file_system_type *, int, + const char *, void *); + void (*kill_sb) (struct super_block *); + struct module *owner; + struct file_system_type * next; + struct list_head fs_supers; struct lock_class_key s_lock_key; struct lock_class_key s_umount_key; }; @@ -205,26 +205,26 @@ This describes how the VFS can manipulate the superblock of your filesystem. As of kernel 2.6.22, the following members are defined: struct super_operations { - struct inode *(*alloc_inode)(struct super_block *sb); - void (*destroy_inode)(struct inode *); + struct inode *(*alloc_inode)(struct super_block *sb); + void (*destroy_inode)(struct inode *); - void (*dirty_inode) (struct inode *, int flags); - int (*write_inode) (struct inode *, int); - void (*drop_inode) (struct inode *); - void (*delete_inode) (struct inode *); - void (*put_super) (struct super_block *); - int (*sync_fs)(struct super_block *sb, int wait); - int (*freeze_fs) (struct super_block *); - int (*unfreeze_fs) (struct super_block *); - int (*statfs) (struct dentry *, struct kstatfs *); - int (*remount_fs) (struct super_block *, int *, char *); - void (*clear_inode) (struct inode *); - void (*umount_begin) (struct super_block *); + void (*dirty_inode) (struct inode *, int flags); + int (*write_inode) (struct inode *, int); + void (*drop_inode) (struct inode *); + void (*delete_inode) (struct inode *); + void (*put_super) (struct super_block *); + int (*sync_fs)(struct super_block *sb, int wait); + int (*freeze_fs) (struct super_block *); + int (*unfreeze_fs) (struct super_block *); + int (*statfs) (struct dentry *, struct kstatfs *); + int (*remount_fs) (struct super_block *, int *, char *); + void (*clear_inode) (struct inode *); + void (*umount_begin) (struct super_block *); - int (*show_options)(struct seq_file *, struct dentry *); + int (*show_options)(struct seq_file *, struct dentry *); - ssize_t (*quota_read)(struct super_block *, int, char *, size_t, loff_t); - ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t); + ssize_t (*quota_read)(struct super_block *, int, char *, size_t, loff_t); + ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t); int (*nr_cached_objects)(struct super_block *); void (*free_cached_objects)(struct super_block *, int); }; @@ -479,7 +479,7 @@ otherwise noted. filesystem. May be called in rcu-walk mode (mask & MAY_NOT_BLOCK). If in rcu-walk - mode, the filesystem must check the permission without blocking or + mode, the filesystem must check the permission without blocking or storing to the inode. If a situation is encountered that rcu-walk cannot handle, return @@ -698,12 +698,12 @@ struct address_space_operations { tagged as DIRTY and will pass them to ->writepage. set_page_dirty: called by the VM to set a page dirty. - This is particularly needed if an address space attaches - private data to a page, and that data needs to be updated when - a page is dirtied. This is called, for example, when a memory + This is particularly needed if an address space attaches + private data to a page, and that data needs to be updated when + a page is dirtied. This is called, for example, when a memory mapped page gets modified. If defined, it should set the PageDirty flag, and the - PAGECACHE_TAG_DIRTY tag in the radix tree. + PAGECACHE_TAG_DIRTY tag in the radix tree. readpages: called by the VM to read pages associated with the address_space object. This is essentially just a vector version of @@ -721,7 +721,7 @@ struct address_space_operations { storage, then those blocks should be pre-read (if they haven't been read already) so that the updated blocks can be written out properly. - The filesystem must return the locked pagecache page for the specified + The filesystem must return the locked pagecache page for the specified offset, in *pagep, for the caller to write into. It must be able to cope with short writes (where the length passed to @@ -730,21 +730,21 @@ struct address_space_operations { flags is a field for AOP_FLAG_xxx flags, described in include/linux/fs.h. - A void * may be returned in fsdata, which then gets passed into - write_end. + A void * may be returned in fsdata, which then gets passed into + write_end. - Returns 0 on success; < 0 on failure (which is the error code), in + Returns 0 on success; < 0 on failure (which is the error code), in which case write_end is not called. write_end: After a successful write_begin, and data copy, write_end must - be called. len is the original len passed to write_begin, and copied - is the amount that was able to be copied. + be called. len is the original len passed to write_begin, and copied + is the amount that was able to be copied. - The filesystem must take care of unlocking the page and releasing it - refcount, and updating i_size. + The filesystem must take care of unlocking the page and releasing it + refcount, and updating i_size. - Returns < 0 on failure, otherwise the number of bytes (<= 'copied') - that were able to be copied into pagecache. + Returns < 0 on failure, otherwise the number of bytes (<= 'copied') + that were able to be copied into pagecache. bmap: called by the VFS to map a logical block offset within object to physical block number. This method is used by the FIBMAP @@ -755,7 +755,7 @@ struct address_space_operations { are and uses those addresses directly. invalidatepage: If a page has PagePrivate set, then invalidatepage - will be called when part or all of the page is to be removed + will be called when part or all of the page is to be removed from the address space. This generally corresponds to either a truncation, punch hole or a complete invalidation of the address space (in the latter case 'offset' will always be 0 and 'length' @@ -767,47 +767,47 @@ struct address_space_operations { release MUST succeed. releasepage: releasepage is called on PagePrivate pages to indicate - that the page should be freed if possible. ->releasepage - should remove any private data from the page and clear the - PagePrivate flag. If releasepage() fails for some reason, it must + that the page should be freed if possible. ->releasepage + should remove any private data from the page and clear the + PagePrivate flag. If releasepage() fails for some reason, it must indicate failure with a 0 return value. releasepage() is used in two distinct though related cases. The first is when the VM finds a clean page with no active users and - wants to make it a free page. If ->releasepage succeeds, the - page will be removed from the address_space and become free. + wants to make it a free page. If ->releasepage succeeds, the + page will be removed from the address_space and become free. The second case is when a request has been made to invalidate - some or all pages in an address_space. This can happen - through the fadvise(POSIX_FADV_DONTNEED) system call or by the - filesystem explicitly requesting it as nfs and 9fs do (when - they believe the cache may be out of date with storage) by - calling invalidate_inode_pages2(). + some or all pages in an address_space. This can happen + through the fadvise(POSIX_FADV_DONTNEED) system call or by the + filesystem explicitly requesting it as nfs and 9fs do (when + they believe the cache may be out of date with storage) by + calling invalidate_inode_pages2(). If the filesystem makes such a call, and needs to be certain - that all pages are invalidated, then its releasepage will - need to ensure this. Possibly it can clear the PageUptodate - bit if it cannot free private data yet. + that all pages are invalidated, then its releasepage will + need to ensure this. Possibly it can clear the PageUptodate + bit if it cannot free private data yet. freepage: freepage is called once the page is no longer visible in - the page cache in order to allow the cleanup of any private + the page cache in order to allow the cleanup of any private data. Since it may be called by the memory reclaimer, it should not assume that the original address_space mapping still exists, and it should not block. direct_IO: called by the generic read/write routines to perform - direct_IO - that is IO requests which bypass the page cache - and transfer data directly between the storage and the - application's address space. + direct_IO - that is IO requests which bypass the page cache + and transfer data directly between the storage and the + application's address space. isolate_page: Called by the VM when isolating a movable non-lru page. If page is successfully isolated, VM marks the page as PG_isolated via __SetPageIsolated. migrate_page: This is used to compact the physical memory usage. - If the VM wants to relocate a page (maybe off a memory card - that is signalling imminent failure) it will pass a new page + If the VM wants to relocate a page (maybe off a memory card + that is signalling imminent failure) it will pass a new page and an old page to this function. migrate_page should transfer any private data across and update any references - that it has to the page. + that it has to the page. putback_page: Called by the VM when isolated page's migration fails. From af96c1e304f7051bf2ee64c9957724bdace05c58 Mon Sep 17 00:00:00 2001 From: "Tobin C. Harding" Date: Wed, 15 May 2019 10:29:13 +1000 Subject: [PATCH 019/129] docs: filesystems: vfs: Convert vfs.txt to RST vfs.txt is currently stale. If we convert it to RST this is a good first step in the process of getting the VFS documentation up to date. This patch does the following (all as a single patch so as not to introduce any new SPHINX build warnings) - Use '.. code-block:: c' for C code blocks and indent the code blocks. - Use double backticks for struct member descriptions. - Fix a couple of build warnings by guarding pointers (*) with double backticks .e.g ``*ptr``. - Add vfs to Documentation/filesystems/index.rst The member descriptions paragraph indentation was not touched. It is not pretty but these do not cause build warnings. These descriptions all need updating anyways so leave it as it is for now. Signed-off-by: Tobin C. Harding Signed-off-by: Jonathan Corbet --- Documentation/filesystems/index.rst | 1 + .../filesystems/{vfs.txt => vfs.rst} | 565 +++++++++--------- 2 files changed, 292 insertions(+), 274 deletions(-) rename Documentation/filesystems/{vfs.txt => vfs.rst} (70%) diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst index 1131c34d77f6..35644840a690 100644 --- a/Documentation/filesystems/index.rst +++ b/Documentation/filesystems/index.rst @@ -16,6 +16,7 @@ algorithms work. .. toctree:: :maxdepth: 2 + vfs path-lookup.rst api-summary splice diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.rst similarity index 70% rename from Documentation/filesystems/vfs.txt rename to Documentation/filesystems/vfs.rst index 4f4f4931bfa0..2ffbdf5f392c 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.rst @@ -85,10 +85,12 @@ Registering and Mounting a Filesystem To register and unregister a filesystem, use the following API functions: - #include +.. code-block:: c - extern int register_filesystem(struct file_system_type *); - extern int unregister_filesystem(struct file_system_type *); + #include + + extern int register_filesystem(struct file_system_type *); + extern int unregister_filesystem(struct file_system_type *); The passed struct file_system_type describes your filesystem. When a request is made to mount a filesystem onto a directory in your @@ -108,47 +110,49 @@ struct file_system_type This describes the filesystem. As of kernel 2.6.39, the following members are defined: -struct file_system_type { - const char *name; - int fs_flags; - struct dentry *(*mount) (struct file_system_type *, int, - const char *, void *); - void (*kill_sb) (struct super_block *); - struct module *owner; - struct file_system_type * next; - struct list_head fs_supers; - struct lock_class_key s_lock_key; - struct lock_class_key s_umount_key; -}; +.. code-block:: c - name: the name of the filesystem type, such as "ext2", "iso9660", + struct file_system_operations { + const char *name; + int fs_flags; + struct dentry *(*mount) (struct file_system_type *, int, + const char *, void *); + void (*kill_sb) (struct super_block *); + struct module *owner; + struct file_system_type * next; + struct list_head fs_supers; + struct lock_class_key s_lock_key; + struct lock_class_key s_umount_key; + }; + +``name``: the name of the filesystem type, such as "ext2", "iso9660", "msdos" and so on - fs_flags: various flags (i.e. FS_REQUIRES_DEV, FS_NO_DCACHE, etc.) +``fs_flags``: various flags (i.e. FS_REQUIRES_DEV, FS_NO_DCACHE, etc.) - mount: the method to call when a new instance of this - filesystem should be mounted +``mount``: the method to call when a new instance of this filesystem should +be mounted - kill_sb: the method to call when an instance of this filesystem +``kill_sb``: the method to call when an instance of this filesystem should be shut down - owner: for internal VFS use: you should initialize this to THIS_MODULE in +``owner``: for internal VFS use: you should initialize this to THIS_MODULE in most cases. - next: for internal VFS use: you should initialize this to NULL +``next``: for internal VFS use: you should initialize this to NULL s_lock_key, s_umount_key: lockdep-specific The mount() method has the following arguments: - struct file_system_type *fs_type: describes the filesystem, partly initialized +``struct file_system_type *fs_type``: describes the filesystem, partly initialized by the specific filesystem code - int flags: mount flags +``int flags``: mount flags - const char *dev_name: the device name we are mounting. +``const char *dev_name``: the device name we are mounting. - void *data: arbitrary mount options, usually comes as an ASCII +``void *data``: arbitrary mount options, usually comes as an ASCII string (see "Mount Options" section) The mount() method must return the root dentry of the tree requested by @@ -174,22 +178,22 @@ implementation. Usually, a filesystem uses one of the generic mount() implementations and provides a fill_super() callback instead. The generic variants are: - mount_bdev: mount a filesystem residing on a block device +``mount_bdev``: mount a filesystem residing on a block device - mount_nodev: mount a filesystem that is not backed by a device +``mount_nodev``: mount a filesystem that is not backed by a device - mount_single: mount a filesystem which shares the instance between +``mount_single``: mount a filesystem which shares the instance between all mounts A fill_super() callback implementation has the following arguments: - struct super_block *sb: the superblock structure. The callback +``struct super_block *sb``: the superblock structure. The callback must initialize this properly. - void *data: arbitrary mount options, usually comes as an ASCII +``void *data``: arbitrary mount options, usually comes as an ASCII string (see "Mount Options" section) - int silent: whether or not to be silent on error +``int silent``: whether or not to be silent on error The Superblock Object @@ -204,54 +208,56 @@ struct super_operations This describes how the VFS can manipulate the superblock of your filesystem. As of kernel 2.6.22, the following members are defined: -struct super_operations { - struct inode *(*alloc_inode)(struct super_block *sb); - void (*destroy_inode)(struct inode *); +.. code-block:: c - void (*dirty_inode) (struct inode *, int flags); - int (*write_inode) (struct inode *, int); - void (*drop_inode) (struct inode *); - void (*delete_inode) (struct inode *); - void (*put_super) (struct super_block *); - int (*sync_fs)(struct super_block *sb, int wait); - int (*freeze_fs) (struct super_block *); - int (*unfreeze_fs) (struct super_block *); - int (*statfs) (struct dentry *, struct kstatfs *); - int (*remount_fs) (struct super_block *, int *, char *); - void (*clear_inode) (struct inode *); - void (*umount_begin) (struct super_block *); + struct super_operations { + struct inode *(*alloc_inode)(struct super_block *sb); + void (*destroy_inode)(struct inode *); - int (*show_options)(struct seq_file *, struct dentry *); + void (*dirty_inode) (struct inode *, int flags); + int (*write_inode) (struct inode *, int); + void (*drop_inode) (struct inode *); + void (*delete_inode) (struct inode *); + void (*put_super) (struct super_block *); + int (*sync_fs)(struct super_block *sb, int wait); + int (*freeze_fs) (struct super_block *); + int (*unfreeze_fs) (struct super_block *); + int (*statfs) (struct dentry *, struct kstatfs *); + int (*remount_fs) (struct super_block *, int *, char *); + void (*clear_inode) (struct inode *); + void (*umount_begin) (struct super_block *); - ssize_t (*quota_read)(struct super_block *, int, char *, size_t, loff_t); - ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t); - int (*nr_cached_objects)(struct super_block *); - void (*free_cached_objects)(struct super_block *, int); -}; + int (*show_options)(struct seq_file *, struct dentry *); + + ssize_t (*quota_read)(struct super_block *, int, char *, size_t, loff_t); + ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t); + int (*nr_cached_objects)(struct super_block *); + void (*free_cached_objects)(struct super_block *, int); + }; All methods are called without any locks being held, unless otherwise noted. This means that most methods can block safely. All methods are only called from a process context (i.e. not from an interrupt handler or bottom half). - alloc_inode: this method is called by alloc_inode() to allocate memory +``alloc_inode``: this method is called by alloc_inode() to allocate memory for struct inode and initialize it. If this function is not defined, a simple 'struct inode' is allocated. Normally alloc_inode will be used to allocate a larger structure which contains a 'struct inode' embedded within it. - destroy_inode: this method is called by destroy_inode() to release +``destroy_inode``: this method is called by destroy_inode() to release resources allocated for struct inode. It is only required if ->alloc_inode was defined and simply undoes anything done by ->alloc_inode. - dirty_inode: this method is called by the VFS to mark an inode dirty. +``dirty_inode``: this method is called by the VFS to mark an inode dirty. - write_inode: this method is called when the VFS needs to write an +``write_inode``: this method is called when the VFS needs to write an inode to disc. The second parameter indicates whether the write should be synchronous or not, not all filesystems check this flag. - drop_inode: called when the last access to the inode is dropped, +``drop_inode``: called when the last access to the inode is dropped, with the inode->i_lock spinlock held. This method should be either NULL (normal UNIX filesystem @@ -264,43 +270,43 @@ or bottom half). but does not have the races that the "force_delete()" approach had. - delete_inode: called when the VFS wants to delete an inode +``delete_inode``: called when the VFS wants to delete an inode - put_super: called when the VFS wishes to free the superblock +``put_super``: called when the VFS wishes to free the superblock (i.e. unmount). This is called with the superblock lock held - sync_fs: called when VFS is writing out all dirty data associated with +``sync_fs``: called when VFS is writing out all dirty data associated with a superblock. The second parameter indicates whether the method should wait until the write out has been completed. Optional. - freeze_fs: called when VFS is locking a filesystem and +``freeze_fs``: called when VFS is locking a filesystem and forcing it into a consistent state. This method is currently used by the Logical Volume Manager (LVM). - unfreeze_fs: called when VFS is unlocking a filesystem and making it writable +``unfreeze_fs``: called when VFS is unlocking a filesystem and making it writable again. - statfs: called when the VFS needs to get filesystem statistics. +``statfs``: called when the VFS needs to get filesystem statistics. - remount_fs: called when the filesystem is remounted. This is called +``remount_fs``: called when the filesystem is remounted. This is called with the kernel lock held - clear_inode: called then the VFS clears the inode. Optional +``clear_inode``: called then the VFS clears the inode. Optional - umount_begin: called when the VFS is unmounting a filesystem. +``umount_begin``: called when the VFS is unmounting a filesystem. - show_options: called by the VFS to show mount options for +``show_options``: called by the VFS to show mount options for /proc//mounts. (see "Mount Options" section) - quota_read: called by the VFS to read from filesystem quota file. +``quota_read``: called by the VFS to read from filesystem quota file. - quota_write: called by the VFS to write to filesystem quota file. +``quota_write``: called by the VFS to write to filesystem quota file. - nr_cached_objects: called by the sb cache shrinking function for the +``nr_cached_objects``: called by the sb cache shrinking function for the filesystem to return the number of freeable cached objects it contains. Optional. - free_cache_objects: called by the sb cache shrinking function for the +``free_cache_objects``: called by the sb cache shrinking function for the filesystem to scan the number of objects indicated to try to free them. Optional, but any filesystem implementing this method needs to also implement ->nr_cached_objects for it to be called correctly. @@ -328,27 +334,27 @@ On filesystems that support extended attributes (xattrs), the s_xattr superblock field points to a NULL-terminated array of xattr handlers. Extended attributes are name:value pairs. - name: Indicates that the handler matches attributes with the specified name +``name``: Indicates that the handler matches attributes with the specified name (such as "system.posix_acl_access"); the prefix field must be NULL. - prefix: Indicates that the handler matches all attributes with the specified +``prefix``: Indicates that the handler matches all attributes with the specified name prefix (such as "user."); the name field must be NULL. - list: Determine if attributes matching this xattr handler should be listed +``list``: Determine if attributes matching this xattr handler should be listed for a particular dentry. Used by some listxattr implementations like generic_listxattr. - get: Called by the VFS to get the value of a particular extended attribute. +``get``: Called by the VFS to get the value of a particular extended attribute. This method is called by the getxattr(2) system call. - set: Called by the VFS to set the value of a particular extended attribute. +``set``: Called by the VFS to set the value of a particular extended attribute. When the new value is NULL, called to remove a particular extended attribute. This method is called by the the setxattr(2) and removexattr(2) system calls. When none of the xattr handlers of a filesystem match the specified attribute name or when a filesystem doesn't support extended attributes, -the various *xattr(2) system calls return -EOPNOTSUPP. +the various ``*xattr(2)`` system calls return -EOPNOTSUPP. The Inode Object @@ -363,41 +369,43 @@ struct inode_operations This describes how the VFS can manipulate an inode in your filesystem. As of kernel 2.6.22, the following members are defined: -struct inode_operations { - int (*create) (struct inode *,struct dentry *, umode_t, bool); - struct dentry * (*lookup) (struct inode *,struct dentry *, unsigned int); - int (*link) (struct dentry *,struct inode *,struct dentry *); - int (*unlink) (struct inode *,struct dentry *); - int (*symlink) (struct inode *,struct dentry *,const char *); - int (*mkdir) (struct inode *,struct dentry *,umode_t); - int (*rmdir) (struct inode *,struct dentry *); - int (*mknod) (struct inode *,struct dentry *,umode_t,dev_t); - int (*rename) (struct inode *, struct dentry *, - struct inode *, struct dentry *, unsigned int); - int (*readlink) (struct dentry *, char __user *,int); - const char *(*get_link) (struct dentry *, struct inode *, - struct delayed_call *); - int (*permission) (struct inode *, int); - int (*get_acl)(struct inode *, int); - int (*setattr) (struct dentry *, struct iattr *); - int (*getattr) (const struct path *, struct kstat *, u32, unsigned int); - ssize_t (*listxattr) (struct dentry *, char *, size_t); - void (*update_time)(struct inode *, struct timespec *, int); - int (*atomic_open)(struct inode *, struct dentry *, struct file *, - unsigned open_flag, umode_t create_mode); - int (*tmpfile) (struct inode *, struct dentry *, umode_t); -}; +.. code-block:: c + + struct inode_operations { + int (*create) (struct inode *,struct dentry *, umode_t, bool); + struct dentry * (*lookup) (struct inode *,struct dentry *, unsigned int); + int (*link) (struct dentry *,struct inode *,struct dentry *); + int (*unlink) (struct inode *,struct dentry *); + int (*symlink) (struct inode *,struct dentry *,const char *); + int (*mkdir) (struct inode *,struct dentry *,umode_t); + int (*rmdir) (struct inode *,struct dentry *); + int (*mknod) (struct inode *,struct dentry *,umode_t,dev_t); + int (*rename) (struct inode *, struct dentry *, + struct inode *, struct dentry *, unsigned int); + int (*readlink) (struct dentry *, char __user *,int); + const char *(*get_link) (struct dentry *, struct inode *, + struct delayed_call *); + int (*permission) (struct inode *, int); + int (*get_acl)(struct inode *, int); + int (*setattr) (struct dentry *, struct iattr *); + int (*getattr) (const struct path *, struct kstat *, u32, unsigned int); + ssize_t (*listxattr) (struct dentry *, char *, size_t); + void (*update_time)(struct inode *, struct timespec *, int); + int (*atomic_open)(struct inode *, struct dentry *, struct file *, + unsigned open_flag, umode_t create_mode); + int (*tmpfile) (struct inode *, struct dentry *, umode_t); + }; Again, all methods are called without any locks being held, unless otherwise noted. - create: called by the open(2) and creat(2) system calls. Only +``create``: called by the open(2) and creat(2) system calls. Only required if you want to support regular files. The dentry you get should not have an inode (i.e. it should be a negative dentry). Here you will probably call d_instantiate() with the dentry and the newly created inode - lookup: called when the VFS needs to look up an inode in a parent +``lookup``: called when the VFS needs to look up an inode in a parent directory. The name to look for is found in the dentry. This method must call d_add() to insert the found inode into the dentry. The "i_count" field in the inode structure should be @@ -411,31 +419,31 @@ otherwise noted. to a struct "dentry_operations". This method is called with the directory inode semaphore held - link: called by the link(2) system call. Only required if you want +``link``: called by the link(2) system call. Only required if you want to support hard links. You will probably need to call d_instantiate() just as you would in the create() method - unlink: called by the unlink(2) system call. Only required if you +``unlink``: called by the unlink(2) system call. Only required if you want to support deleting inodes - symlink: called by the symlink(2) system call. Only required if you +``symlink``: called by the symlink(2) system call. Only required if you want to support symlinks. You will probably need to call d_instantiate() just as you would in the create() method - mkdir: called by the mkdir(2) system call. Only required if you want +``mkdir``: called by the mkdir(2) system call. Only required if you want to support creating subdirectories. You will probably need to call d_instantiate() just as you would in the create() method - rmdir: called by the rmdir(2) system call. Only required if you want +``rmdir``: called by the rmdir(2) system call. Only required if you want to support deleting subdirectories - mknod: called by the mknod(2) system call to create a device (char, +``mknod``: called by the mknod(2) system call to create a device (char, block) inode or a named pipe (FIFO) or socket. Only required if you want to support creating these types of inodes. You will probably need to call d_instantiate() just as you would in the create() method - rename: called by the rename(2) system call to rename the object to +``rename``: called by the rename(2) system call to rename the object to have the parent and name given by the second inode and dentry. The filesystem must return -EINVAL for any unsupported or @@ -449,7 +457,7 @@ otherwise noted. exist; this is checked by the VFS. Unlike plain rename, source and target may be of different type. - get_link: called by the VFS to follow a symbolic link to the +``get_link``: called by the VFS to follow a symbolic link to the inode it points to. Only required if you want to support symbolic links. This method returns the symlink body to traverse (and possibly resets the current position with @@ -463,19 +471,20 @@ otherwise noted. argument. If request can't be handled without leaving RCU mode, have it return ERR_PTR(-ECHILD). + If the filesystem stores the symlink target in ->i_link, the VFS may use it directly without calling ->get_link(); however, ->get_link() must still be provided. ->i_link must not be freed until after an RCU grace period. Writing to ->i_link post-iget() time requires a 'release' memory barrier. - readlink: this is now just an override for use by readlink(2) for the +``readlink``: this is now just an override for use by readlink(2) for the cases when ->get_link uses nd_jump_link() or object is not in fact a symlink. Normally filesystems should only implement ->get_link for symlinks and readlink(2) will automatically use that. - permission: called by the VFS to check for access rights on a POSIX-like +``permission``: called by the VFS to check for access rights on a POSIX-like filesystem. May be called in rcu-walk mode (mask & MAY_NOT_BLOCK). If in rcu-walk @@ -485,20 +494,20 @@ otherwise noted. If a situation is encountered that rcu-walk cannot handle, return -ECHILD and it will be called again in ref-walk mode. - setattr: called by the VFS to set attributes for a file. This method +``setattr``: called by the VFS to set attributes for a file. This method is called by chmod(2) and related system calls. - getattr: called by the VFS to get attributes of a file. This method +``getattr``: called by the VFS to get attributes of a file. This method is called by stat(2) and related system calls. - listxattr: called by the VFS to list all extended attributes for a +``listxattr``: called by the VFS to list all extended attributes for a given file. This method is called by the listxattr(2) system call. - update_time: called by the VFS to update a specific time or the i_version of +``update_time``: called by the VFS to update a specific time or the i_version of an inode. If this is not defined the VFS will update the inode itself and call mark_inode_dirty_sync. - atomic_open: called on the last component of an open. Using this optional +``atomic_open``: called on the last component of an open. Using this optional method the filesystem can look up, possibly create and open the file in one atomic operation. If it wants to leave actual opening to the caller (e.g. if the file turned out to be a symlink, device, or just @@ -510,7 +519,7 @@ otherwise noted. the method must only succeed if the file didn't exist and hence FMODE_CREATED shall always be set on success. - tmpfile: called in the end of O_TMPFILE open(). Optional, equivalent to +``tmpfile``: called in the end of O_TMPFILE open(). Optional, equivalent to atomically creating, opening and unlinking a file in given directory. @@ -628,41 +637,43 @@ struct address_space_operations This describes how the VFS can manipulate mapping of a file to page cache in your filesystem. The following members are defined: -struct address_space_operations { - int (*writepage)(struct page *page, struct writeback_control *wbc); - int (*readpage)(struct file *, struct page *); - int (*writepages)(struct address_space *, struct writeback_control *); - int (*set_page_dirty)(struct page *page); - int (*readpages)(struct file *filp, struct address_space *mapping, - struct list_head *pages, unsigned nr_pages); - int (*write_begin)(struct file *, struct address_space *mapping, - loff_t pos, unsigned len, unsigned flags, +.. code-block:: c + + struct address_space_operations { + int (*writepage)(struct page *page, struct writeback_control *wbc); + int (*readpage)(struct file *, struct page *); + int (*writepages)(struct address_space *, struct writeback_control *); + int (*set_page_dirty)(struct page *page); + int (*readpages)(struct file *filp, struct address_space *mapping, + struct list_head *pages, unsigned nr_pages); + int (*write_begin)(struct file *, struct address_space *mapping, + loff_t pos, unsigned len, unsigned flags, struct page **pagep, void **fsdata); - int (*write_end)(struct file *, struct address_space *mapping, - loff_t pos, unsigned len, unsigned copied, - struct page *page, void *fsdata); - sector_t (*bmap)(struct address_space *, sector_t); - void (*invalidatepage) (struct page *, unsigned int, unsigned int); - int (*releasepage) (struct page *, int); - void (*freepage)(struct page *); - ssize_t (*direct_IO)(struct kiocb *, struct iov_iter *iter); - /* isolate a page for migration */ - bool (*isolate_page) (struct page *, isolate_mode_t); - /* migrate the contents of a page to the specified target */ - int (*migratepage) (struct page *, struct page *); - /* put migration-failed page back to right list */ - void (*putback_page) (struct page *); - int (*launder_page) (struct page *); + int (*write_end)(struct file *, struct address_space *mapping, + loff_t pos, unsigned len, unsigned copied, + struct page *page, void *fsdata); + sector_t (*bmap)(struct address_space *, sector_t); + void (*invalidatepage) (struct page *, unsigned int, unsigned int); + int (*releasepage) (struct page *, int); + void (*freepage)(struct page *); + ssize_t (*direct_IO)(struct kiocb *, struct iov_iter *iter); + /* isolate a page for migration */ + bool (*isolate_page) (struct page *, isolate_mode_t); + /* migrate the contents of a page to the specified target */ + int (*migratepage) (struct page *, struct page *); + /* put migration-failed page back to right list */ + void (*putback_page) (struct page *); + int (*launder_page) (struct page *); - int (*is_partially_uptodate) (struct page *, unsigned long, - unsigned long); - void (*is_dirty_writeback) (struct page *, bool *, bool *); - int (*error_remove_page) (struct mapping *mapping, struct page *page); - int (*swap_activate)(struct file *); - int (*swap_deactivate)(struct file *); -}; + int (*is_partially_uptodate) (struct page *, unsigned long, + unsigned long); + void (*is_dirty_writeback) (struct page *, bool *, bool *); + int (*error_remove_page) (struct mapping *mapping, struct page *page); + int (*swap_activate)(struct file *); + int (*swap_deactivate)(struct file *); + }; - writepage: called by the VM to write a dirty page to backing store. +``writepage``: called by the VM to write a dirty page to backing store. This may happen for data integrity reasons (i.e. 'sync'), or to free up memory (flush). The difference can be seen in wbc->sync_mode. @@ -680,7 +691,7 @@ struct address_space_operations { See the file "Locking" for more details. - readpage: called by the VM to read a page from backing store. +``readpage``: called by the VM to read a page from backing store. The page will be Locked when readpage is called, and should be unlocked and marked uptodate once the read completes. If ->readpage discovers that it needs to unlock the page for @@ -688,7 +699,7 @@ struct address_space_operations { In this case, the page will be relocated, relocked and if that all succeeds, ->readpage will be called again. - writepages: called by the VM to write out pages associated with the +``writepages``: called by the VM to write out pages associated with the address_space object. If wbc->sync_mode is WBC_SYNC_ALL, then the writeback_control will specify a range of pages that must be written out. If it is WBC_SYNC_NONE, then a nr_to_write is given @@ -697,7 +708,7 @@ struct address_space_operations { instead. This will choose pages from the address space that are tagged as DIRTY and will pass them to ->writepage. - set_page_dirty: called by the VM to set a page dirty. +``set_page_dirty``: called by the VM to set a page dirty. This is particularly needed if an address space attaches private data to a page, and that data needs to be updated when a page is dirtied. This is called, for example, when a memory @@ -705,14 +716,14 @@ struct address_space_operations { If defined, it should set the PageDirty flag, and the PAGECACHE_TAG_DIRTY tag in the radix tree. - readpages: called by the VM to read pages associated with the address_space +``readpages``: called by the VM to read pages associated with the address_space object. This is essentially just a vector version of readpage. Instead of just one page, several pages are requested. readpages is only used for read-ahead, so read errors are ignored. If anything goes wrong, feel free to give up. - write_begin: +``write_begin``: Called by the generic buffered write code to ask the filesystem to prepare to write len bytes at the given offset in the file. The address_space should check that the write will be able to complete, @@ -722,7 +733,7 @@ struct address_space_operations { read already) so that the updated blocks can be written out properly. The filesystem must return the locked pagecache page for the specified - offset, in *pagep, for the caller to write into. + offset, in ``*pagep``, for the caller to write into. It must be able to cope with short writes (where the length passed to write_begin is greater than the number of bytes copied into the page). @@ -736,7 +747,7 @@ struct address_space_operations { Returns 0 on success; < 0 on failure (which is the error code), in which case write_end is not called. - write_end: After a successful write_begin, and data copy, write_end must +``write_end``: After a successful write_begin, and data copy, write_end must be called. len is the original len passed to write_begin, and copied is the amount that was able to be copied. @@ -746,7 +757,7 @@ struct address_space_operations { Returns < 0 on failure, otherwise the number of bytes (<= 'copied') that were able to be copied into pagecache. - bmap: called by the VFS to map a logical block offset within object to +``bmap``: called by the VFS to map a logical block offset within object to physical block number. This method is used by the FIBMAP ioctl and for working with swap-files. To be able to swap to a file, the file must have a stable mapping to a block @@ -754,7 +765,7 @@ struct address_space_operations { but instead uses bmap to find out where the blocks in the file are and uses those addresses directly. - invalidatepage: If a page has PagePrivate set, then invalidatepage +``invalidatepage``: If a page has PagePrivate set, then invalidatepage will be called when part or all of the page is to be removed from the address space. This generally corresponds to either a truncation, punch hole or a complete invalidation of the address @@ -766,7 +777,7 @@ struct address_space_operations { be done by calling the ->releasepage function, but in this case the release MUST succeed. - releasepage: releasepage is called on PagePrivate pages to indicate +``releasepage``: releasepage is called on PagePrivate pages to indicate that the page should be freed if possible. ->releasepage should remove any private data from the page and clear the PagePrivate flag. If releasepage() fails for some reason, it must @@ -787,40 +798,40 @@ struct address_space_operations { need to ensure this. Possibly it can clear the PageUptodate bit if it cannot free private data yet. - freepage: freepage is called once the page is no longer visible in +``freepage``: freepage is called once the page is no longer visible in the page cache in order to allow the cleanup of any private data. Since it may be called by the memory reclaimer, it should not assume that the original address_space mapping still exists, and it should not block. - direct_IO: called by the generic read/write routines to perform +``direct_IO``: called by the generic read/write routines to perform direct_IO - that is IO requests which bypass the page cache and transfer data directly between the storage and the application's address space. - isolate_page: Called by the VM when isolating a movable non-lru page. +``isolate_page``: Called by the VM when isolating a movable non-lru page. If page is successfully isolated, VM marks the page as PG_isolated via __SetPageIsolated. - migrate_page: This is used to compact the physical memory usage. +``migrate_page``: This is used to compact the physical memory usage. If the VM wants to relocate a page (maybe off a memory card that is signalling imminent failure) it will pass a new page and an old page to this function. migrate_page should transfer any private data across and update any references that it has to the page. - putback_page: Called by the VM when isolated page's migration fails. +``putback_page``: Called by the VM when isolated page's migration fails. - launder_page: Called before freeing a page - it writes back the dirty page. To +``launder_page``: Called before freeing a page - it writes back the dirty page. To prevent redirtying the page, it is kept locked during the whole operation. - is_partially_uptodate: Called by the VM when reading a file through the +``is_partially_uptodate``: Called by the VM when reading a file through the pagecache when the underlying blocksize != pagesize. If the required block is up to date then the read can complete without needing the IO to bring the whole page up to date. - is_dirty_writeback: Called by the VM when attempting to reclaim a page. +``is_dirty_writeback``: Called by the VM when attempting to reclaim a page. The VM uses dirty and writeback information to determine if it needs to stall to allow flushers a chance to complete some IO. Ordinarily it can use PageDirty and PageWriteback but some filesystems have @@ -829,17 +840,17 @@ struct address_space_operations { allows a filesystem to indicate to the VM if a page should be treated as dirty or writeback for the purposes of stalling. - error_remove_page: normally set to generic_error_remove_page if truncation +``error_remove_page``: normally set to generic_error_remove_page if truncation is ok for this address space. Used for memory failure handling. Setting this implies you deal with pages going away under you, unless you have them locked or reference counts increased. - swap_activate: Called when swapon is used on a file to allocate +``swap_activate``: Called when swapon is used on a file to allocate space if necessary and pin the block lookup information in memory. A return value of zero indicates success, in which case this file can be used to back swapspace. - swap_deactivate: Called during swapoff on files where swap_activate +``swap_deactivate``: Called during swapoff on files where swap_activate was successful. @@ -856,78 +867,80 @@ struct file_operations This describes how the VFS can manipulate an open file. As of kernel 4.18, the following members are defined: -struct file_operations { - struct module *owner; - loff_t (*llseek) (struct file *, loff_t, int); - ssize_t (*read) (struct file *, char __user *, size_t, loff_t *); - ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *); - ssize_t (*read_iter) (struct kiocb *, struct iov_iter *); - ssize_t (*write_iter) (struct kiocb *, struct iov_iter *); - int (*iopoll)(struct kiocb *kiocb, bool spin); - int (*iterate) (struct file *, struct dir_context *); - int (*iterate_shared) (struct file *, struct dir_context *); - __poll_t (*poll) (struct file *, struct poll_table_struct *); - long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long); - long (*compat_ioctl) (struct file *, unsigned int, unsigned long); - int (*mmap) (struct file *, struct vm_area_struct *); - int (*open) (struct inode *, struct file *); - int (*flush) (struct file *, fl_owner_t id); - int (*release) (struct inode *, struct file *); - int (*fsync) (struct file *, loff_t, loff_t, int datasync); - int (*fasync) (int, struct file *, int); - int (*lock) (struct file *, int, struct file_lock *); - ssize_t (*sendpage) (struct file *, struct page *, int, size_t, loff_t *, int); - unsigned long (*get_unmapped_area)(struct file *, unsigned long, unsigned long, unsigned long, unsigned long); - int (*check_flags)(int); - int (*flock) (struct file *, int, struct file_lock *); - ssize_t (*splice_write)(struct pipe_inode_info *, struct file *, loff_t *, size_t, unsigned int); - ssize_t (*splice_read)(struct file *, loff_t *, struct pipe_inode_info *, size_t, unsigned int); - int (*setlease)(struct file *, long, struct file_lock **, void **); - long (*fallocate)(struct file *file, int mode, loff_t offset, - loff_t len); - void (*show_fdinfo)(struct seq_file *m, struct file *f); -#ifndef CONFIG_MMU - unsigned (*mmap_capabilities)(struct file *); -#endif - ssize_t (*copy_file_range)(struct file *, loff_t, struct file *, loff_t, size_t, unsigned int); - loff_t (*remap_file_range)(struct file *file_in, loff_t pos_in, - struct file *file_out, loff_t pos_out, - loff_t len, unsigned int remap_flags); - int (*fadvise)(struct file *, loff_t, loff_t, int); -}; +.. code-block:: c + + struct file_operations { + struct module *owner; + loff_t (*llseek) (struct file *, loff_t, int); + ssize_t (*read) (struct file *, char __user *, size_t, loff_t *); + ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *); + ssize_t (*read_iter) (struct kiocb *, struct iov_iter *); + ssize_t (*write_iter) (struct kiocb *, struct iov_iter *); + int (*iopoll)(struct kiocb *kiocb, bool spin); + int (*iterate) (struct file *, struct dir_context *); + int (*iterate_shared) (struct file *, struct dir_context *); + __poll_t (*poll) (struct file *, struct poll_table_struct *); + long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long); + long (*compat_ioctl) (struct file *, unsigned int, unsigned long); + int (*mmap) (struct file *, struct vm_area_struct *); + int (*open) (struct inode *, struct file *); + int (*flush) (struct file *, fl_owner_t id); + int (*release) (struct inode *, struct file *); + int (*fsync) (struct file *, loff_t, loff_t, int datasync); + int (*fasync) (int, struct file *, int); + int (*lock) (struct file *, int, struct file_lock *); + ssize_t (*sendpage) (struct file *, struct page *, int, size_t, loff_t *, int); + unsigned long (*get_unmapped_area)(struct file *, unsigned long, unsigned long, unsigned long, unsigned long); + int (*check_flags)(int); + int (*flock) (struct file *, int, struct file_lock *); + ssize_t (*splice_write)(struct pipe_inode_info *, struct file *, loff_t *, size_t, unsigned int); + ssize_t (*splice_read)(struct file *, loff_t *, struct pipe_inode_info *, size_t, unsigned int); + int (*setlease)(struct file *, long, struct file_lock **, void **); + long (*fallocate)(struct file *file, int mode, loff_t offset, + loff_t len); + void (*show_fdinfo)(struct seq_file *m, struct file *f); + #ifndef CONFIG_MMU + unsigned (*mmap_capabilities)(struct file *); + #endif + ssize_t (*copy_file_range)(struct file *, loff_t, struct file *, loff_t, size_t, unsigned int); + loff_t (*remap_file_range)(struct file *file_in, loff_t pos_in, + struct file *file_out, loff_t pos_out, + loff_t len, unsigned int remap_flags); + int (*fadvise)(struct file *, loff_t, loff_t, int); + }; Again, all methods are called without any locks being held, unless otherwise noted. - llseek: called when the VFS needs to move the file position index +``llseek``: called when the VFS needs to move the file position index - read: called by read(2) and related system calls +``read``: called by read(2) and related system calls - read_iter: possibly asynchronous read with iov_iter as destination +``read_iter``: possibly asynchronous read with iov_iter as destination - write: called by write(2) and related system calls +``write``: called by write(2) and related system calls - write_iter: possibly asynchronous write with iov_iter as source +``write_iter``: possibly asynchronous write with iov_iter as source - iopoll: called when aio wants to poll for completions on HIPRI iocbs +``iopoll``: called when aio wants to poll for completions on HIPRI iocbs - iterate: called when the VFS needs to read the directory contents +``iterate``: called when the VFS needs to read the directory contents - iterate_shared: called when the VFS needs to read the directory contents +``iterate_shared``: called when the VFS needs to read the directory contents when filesystem supports concurrent dir iterators - poll: called by the VFS when a process wants to check if there is +``poll``: called by the VFS when a process wants to check if there is activity on this file and (optionally) go to sleep until there is activity. Called by the select(2) and poll(2) system calls - unlocked_ioctl: called by the ioctl(2) system call. +``unlocked_ioctl``: called by the ioctl(2) system call. - compat_ioctl: called by the ioctl(2) system call when 32 bit system calls +``compat_ioctl``: called by the ioctl(2) system call when 32 bit system calls are used on 64 bit kernels. - mmap: called by the mmap(2) system call +``mmap``: called by the mmap(2) system call - open: called by the VFS when an inode should be opened. When the VFS +``open``: called by the VFS when an inode should be opened. When the VFS opens a file, it creates a new "struct file". It then calls the open method for the newly allocated file structure. You might think that the open method really belongs in @@ -937,40 +950,40 @@ otherwise noted. "private_data" member in the file structure if you want to point to a device structure - flush: called by the close(2) system call to flush a file +``flush``: called by the close(2) system call to flush a file - release: called when the last reference to an open file is closed +``release``: called when the last reference to an open file is closed - fsync: called by the fsync(2) system call. Also see the section above +``fsync``: called by the fsync(2) system call. Also see the section above entitled "Handling errors during writeback". - fasync: called by the fcntl(2) system call when asynchronous +``fasync``: called by the fcntl(2) system call when asynchronous (non-blocking) mode is enabled for a file - lock: called by the fcntl(2) system call for F_GETLK, F_SETLK, and F_SETLKW +``lock``: called by the fcntl(2) system call for F_GETLK, F_SETLK, and F_SETLKW commands - get_unmapped_area: called by the mmap(2) system call +``get_unmapped_area``: called by the mmap(2) system call - check_flags: called by the fcntl(2) system call for F_SETFL command +``check_flags``: called by the fcntl(2) system call for F_SETFL command - flock: called by the flock(2) system call +``flock``: called by the flock(2) system call - splice_write: called by the VFS to splice data from a pipe to a file. This +``splice_write``: called by the VFS to splice data from a pipe to a file. This method is used by the splice(2) system call - splice_read: called by the VFS to splice data from file to a pipe. This +``splice_read``: called by the VFS to splice data from file to a pipe. This method is used by the splice(2) system call - setlease: called by the VFS to set or release a file lock lease. setlease +``setlease``: called by the VFS to set or release a file lock lease. setlease implementations should call generic_setlease to record or remove the lease in the inode after setting it. - fallocate: called by the VFS to preallocate blocks or punch a hole. +``fallocate``: called by the VFS to preallocate blocks or punch a hole. - copy_file_range: called by the copy_file_range(2) system call. +``copy_file_range``: called by the copy_file_range(2) system call. - remap_file_range: called by the ioctl(2) system call for FICLONERANGE and +``remap_file_range``: called by the ioctl(2) system call for FICLONERANGE and FICLONE and FIDEDUPERANGE commands to remap file ranges. An implementation should remap len bytes at pos_in of the source file into the dest file at pos_out. Implementations must handle callers passing @@ -983,7 +996,7 @@ otherwise noted. set, the caller is ok with the implementation shortening the request length to satisfy alignment or EOF requirements (or any other reason). - fadvise: possibly called by the fadvise64() system call. +``fadvise``: possibly called by the fadvise64() system call. Note that the file operations are implemented by the specific filesystem in which the inode resides. When opening a device node @@ -1010,23 +1023,25 @@ here. These methods may be set to NULL, as they are either optional or the VFS uses a default. As of kernel 2.6.22, the following members are defined: -struct dentry_operations { - int (*d_revalidate)(struct dentry *, unsigned int); - int (*d_weak_revalidate)(struct dentry *, unsigned int); - int (*d_hash)(const struct dentry *, struct qstr *); - int (*d_compare)(const struct dentry *, - unsigned int, const char *, const struct qstr *); - int (*d_delete)(const struct dentry *); - int (*d_init)(struct dentry *); - void (*d_release)(struct dentry *); - void (*d_iput)(struct dentry *, struct inode *); - char *(*d_dname)(struct dentry *, char *, int); - struct vfsmount *(*d_automount)(struct path *); - int (*d_manage)(const struct path *, bool); - struct dentry *(*d_real)(struct dentry *, const struct inode *); -}; +.. code-block:: c - d_revalidate: called when the VFS needs to revalidate a dentry. This + struct dentry_operations { + int (*d_revalidate)(struct dentry *, unsigned int); + int (*d_weak_revalidate)(struct dentry *, unsigned int); + int (*d_hash)(const struct dentry *, struct qstr *); + int (*d_compare)(const struct dentry *, + unsigned int, const char *, const struct qstr *); + int (*d_delete)(const struct dentry *); + int (*d_init)(struct dentry *); + void (*d_release)(struct dentry *); + void (*d_iput)(struct dentry *, struct inode *); + char *(*d_dname)(struct dentry *, char *, int); + struct vfsmount *(*d_automount)(struct path *); + int (*d_manage)(const struct path *, bool); + struct dentry *(*d_real)(struct dentry *, const struct inode *); + }; + +``d_revalidate``: called when the VFS needs to revalidate a dentry. This is called whenever a name look-up finds a dentry in the dcache. Most local filesystems leave this as NULL, because all their dentries in the dcache are valid. Network filesystems are different @@ -1045,7 +1060,7 @@ struct dentry_operations { If a situation is encountered that rcu-walk cannot handle, return -ECHILD and it will be called again in ref-walk mode. - d_weak_revalidate: called when the VFS needs to revalidate a "jumped" dentry. +``_weak_revalidate``: called when the VFS needs to revalidate a "jumped" dentry. This is called when a path-walk ends at dentry that was not acquired by doing a lookup in the parent directory. This includes "/", "." and "..", as well as procfs-style symlinks and mountpoint traversal. @@ -1059,14 +1074,14 @@ struct dentry_operations { d_weak_revalidate is only called after leaving rcu-walk mode. - d_hash: called when the VFS adds a dentry to the hash table. The first +``d_hash``: called when the VFS adds a dentry to the hash table. The first dentry passed to d_hash is the parent directory that the name is to be hashed into. Same locking and synchronisation rules as d_compare regarding what is safe to dereference etc. - d_compare: called to compare a dentry name with a given name. The first +``d_compare``: called to compare a dentry name with a given name. The first dentry is the parent of the dentry to be compared, the second is the child dentry. len and name string are properties of the dentry to be compared. qstr is the name to compare it with. @@ -1083,22 +1098,22 @@ struct dentry_operations { It is a tricky calling convention because it needs to be called under "rcu-walk", ie. without any locks or references on things. - d_delete: called when the last reference to a dentry is dropped and the +``d_delete``: called when the last reference to a dentry is dropped and the dcache is deciding whether or not to cache it. Return 1 to delete immediately, or 0 to cache the dentry. Default is NULL which means to always cache a reachable dentry. d_delete must be constant and idempotent. - d_init: called when a dentry is allocated +``d_init``: called when a dentry is allocated - d_release: called when a dentry is really deallocated +``d_release``: called when a dentry is really deallocated - d_iput: called when a dentry loses its inode (just prior to its +``d_iput``: called when a dentry loses its inode (just prior to its being deallocated). The default when this is NULL is that the VFS calls iput(). If you define this method, you must call iput() yourself - d_dname: called when the pathname of a dentry should be generated. +``d_dname``: called when the pathname of a dentry should be generated. Useful for some pseudo filesystems (sockfs, pipefs, ...) to delay pathname generation. (Instead of doing it when dentry is created, it's done only when the path is needed.). Real filesystems probably @@ -1112,13 +1127,15 @@ struct dentry_operations { Example : +.. code-block:: c + static char *pipefs_dname(struct dentry *dent, char *buffer, int buflen) { return dynamic_dname(dentry, buffer, buflen, "pipe:[%lu]", dentry->d_inode->i_ino); } - d_automount: called when an automount dentry is to be traversed (optional). +``d_automount``: called when an automount dentry is to be traversed (optional). This should create a new VFS mount record and return the record to the caller. The caller is supplied with a path parameter giving the automount directory to describe the automount target and the parent @@ -1138,7 +1155,7 @@ struct dentry_operations { dentry. This is set by __d_instantiate() if S_AUTOMOUNT is set on the inode being added. - d_manage: called to allow the filesystem to manage the transition from a +``d_manage``: called to allow the filesystem to manage the transition from a dentry (optional). This allows autofs, for example, to hold up clients waiting to explore behind a 'mountpoint' while letting the daemon go past and construct the subtree there. 0 should be returned to let the @@ -1156,7 +1173,7 @@ struct dentry_operations { This function is only used if DCACHE_MANAGE_TRANSIT is set on the dentry being transited from. - d_real: overlay/union type filesystems implement this method to return one of +``d_real``: overlay/union type filesystems implement this method to return one of the underlying dentries hidden by the overlay. It is used in two different modes: @@ -1178,36 +1195,36 @@ Directory Entry Cache API There are a number of functions defined which permit a filesystem to manipulate dentries: - dget: open a new handle for an existing dentry (this just increments +``dget``: open a new handle for an existing dentry (this just increments the usage count) - dput: close a handle for a dentry (decrements the usage count). If +``dput``: close a handle for a dentry (decrements the usage count). If the usage count drops to 0, and the dentry is still in its parent's hash, the "d_delete" method is called to check whether it should be cached. If it should not be cached, or if the dentry is not hashed, it is deleted. Otherwise cached dentries are put into an LRU list to be reclaimed on memory shortage. - d_drop: this unhashes a dentry from its parents hash list. A +``d_drop``: this unhashes a dentry from its parents hash list. A subsequent call to dput() will deallocate the dentry if its usage count drops to 0 - d_delete: delete a dentry. If there are no other open references to +``d_delete``: delete a dentry. If there are no other open references to the dentry then the dentry is turned into a negative dentry (the d_iput() method is called). If there are other references, then d_drop() is called instead - d_add: add a dentry to its parents hash list and then calls +``d_add``: add a dentry to its parents hash list and then calls d_instantiate() - d_instantiate: add a dentry to the alias hash list for the inode and +``d_instantiate``: add a dentry to the alias hash list for the inode and updates the "d_inode" member. The "i_count" member in the inode structure should be set/incremented. If the inode pointer is NULL, the dentry is called a "negative dentry". This function is commonly called when an inode is created for an existing negative dentry - d_lookup: look up a dentry given its parent and path name component +``d_lookup``: look up a dentry given its parent and path name component It looks up the child of that given name from the dcache hash table. If it is found, the reference count is incremented and the dentry is returned. The caller must use dput() From 44f42165177e6c32f3a6aaceeaf7d9cd1c95595f Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Wed, 29 May 2019 20:09:24 -0300 Subject: [PATCH 020/129] scripts/sphinx-pre-install: make activate hint smarter It is possible that multiple Sphinx virtualenvs are installed on a given kernel tree. Change the logic to get the latest version of those, as this is probably what the user wants. Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Jonathan Corbet --- scripts/sphinx-pre-install | 22 ++++++++++++++-------- 1 file changed, 14 insertions(+), 8 deletions(-) diff --git a/scripts/sphinx-pre-install b/scripts/sphinx-pre-install index 8c2d1bcf2e02..11239eb29695 100755 --- a/scripts/sphinx-pre-install +++ b/scripts/sphinx-pre-install @@ -1,7 +1,7 @@ #!/usr/bin/perl use strict; -# Copyright (c) 2017 Mauro Carvalho Chehab +# Copyright (c) 2017-2019 Mauro Carvalho Chehab # # This program is free software; you can redistribute it and/or # modify it under the terms of the GNU General Public License @@ -15,6 +15,7 @@ use strict; my $conf = "Documentation/conf.py"; my $requirement_file = "Documentation/sphinx/requirements.txt"; +my $virtenv_prefix = "sphinx_"; # # Static vars @@ -28,7 +29,8 @@ my $need_symlink = 0; my $need_sphinx = 0; my $rec_sphinx_upgrade = 0; my $install = ""; -my $virtenv_dir = "sphinx_"; +my $virtenv_dir = ""; +my $min_version; # # Command line arguments @@ -229,7 +231,6 @@ sub get_sphinx_fname() sub check_sphinx() { - my $min_version; my $rec_version; my $cur_version; @@ -255,7 +256,7 @@ sub check_sphinx() die "Can't get recommended sphinx version from $requirement_file" if (!$min_version); - $virtenv_dir .= $rec_version; + $virtenv_dir = $virtenv_prefix . $rec_version; my $sphinx = get_sphinx_fname(); return if ($sphinx eq ""); @@ -612,18 +613,23 @@ sub check_needs() which("sphinx-build-3"); } if ($need_sphinx || $rec_sphinx_upgrade) { - my $activate = "$virtenv_dir/bin/activate"; - if (-e "$ENV{'PWD'}/$activate") { + my $min_activate = "$ENV{'PWD'}/${virtenv_prefix}${min_version}/bin/activate"; + my @activates = glob "$ENV{'PWD'}/${virtenv_prefix}*/bin/activate"; + + @activates = sort {$b cmp $a} @activates; + + if (scalar @activates > 0 && $activates[0] ge $min_activate) { printf "\nNeed to activate virtualenv with:\n"; - printf "\t. $activate\n"; + printf "\t. $activates[0]\n"; } else { + my $rec_activate = "$virtenv_dir/bin/activate"; my $virtualenv = findprog("virtualenv-3"); $virtualenv = findprog("virtualenv-3.5") if (!$virtualenv); $virtualenv = findprog("virtualenv") if (!$virtualenv); $virtualenv = "virtualenv" if (!$virtualenv); printf "\t$virtualenv $virtenv_dir\n"; - printf "\t. $activate\n"; + printf "\t. $rec_activate\n"; printf "\tpip install -r $requirement_file\n"; $need++ if (!$rec_sphinx_upgrade); From c4c562defedb7634a717293a5192071983e79781 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Wed, 29 May 2019 20:09:25 -0300 Subject: [PATCH 021/129] scripts/sphinx-pre-install: get rid of RHEL7 explicity check RHEL8 was already launched. This test won't get it, and will do the wrong thing. Ok, we could fix it, but now we check Sphinx version to ensure that it matches the minimal (1.3), so there's no need for an explicit check there. Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Jonathan Corbet --- scripts/sphinx-pre-install | 13 ------------- 1 file changed, 13 deletions(-) diff --git a/scripts/sphinx-pre-install b/scripts/sphinx-pre-install index 11239eb29695..ded3e2ef3f8d 100755 --- a/scripts/sphinx-pre-install +++ b/scripts/sphinx-pre-install @@ -581,19 +581,6 @@ sub check_needs() print "Unknown OS\n"; } - # RHEL 7.x and clones have Sphinx version 1.1.x and incomplete texlive - if (($system_release =~ /Red Hat Enterprise Linux/) || - ($system_release =~ /CentOS/) || - ($system_release =~ /Scientific Linux/) || - ($system_release =~ /Oracle Linux Server/)) { - $virtualenv = 1; - $pdf = 0; - - printf("NOTE: On this distro, Sphinx and TexLive shipped versions are incompatible\n"); - printf("with doc build. So, use Sphinx via a Python virtual environment.\n\n"); - printf("This script can't install a TexLive version that would provide PDF.\n"); - } - # Check for needed programs/tools check_sphinx(); check_perl_module("Pod::Usage", 0); From 9b88ad5464af1bf7228991f1c46a9a13484790a4 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Wed, 29 May 2019 20:09:26 -0300 Subject: [PATCH 022/129] scripts/sphinx-pre-install: always check if version is compatible with build Call the script every time a make docs target is selected, on a simplified check mode. With this change, the script will set two vars: $min_version - obtained from `needs_sphinx` var inside conf.py (currently, '1.3') $rec_version - obtained from sphinx/requirements.txt. With those changes, a target like "make htmldocs" will do: 1) If no sphinx-build/sphinx-build3 is found, it will run the script on normal mode as before, checking for all system dependencies and providing install hints for the needed programs and will abort the build; 2) If no sphinx-build/sphinx-build3 is found, but there is a sphinx_${VER}/bin/activate file, and if ${VER} >= $min_version (string comparation), it will run in full mode, and will recommend to activate the virtualenv. If there are multiple virtualenvs, it will string sort the versions, recommending the highest version and will abort the build; 3) If Sphinx is detected but has a version lower than $min_version, it will run in full mode - with will recommend creating a virtual env using sphinx/requirements.txt, and will abort the build. 4) If Sphinx is detected and version is lower than $rec_version, it will run in full mode and will recommend creating a virtual env using sphinx/requirements.txt. In this case, it **won't** abort the build. 5) If Sphinx is detected and version is equal or righer than $rec_version it will return just after detecting the version ("quick mode"), not checking if are there any missing dependencies. Just like before, if one wants to install Sphinx from the distro, it has to call the script manually and use `--no-virtualenv` argument to get the hints for his OS: You should run: sudo dnf install -y python3-sphinx python3-sphinx_rtd_theme While here, add a small help for the three optional arguments for the script. Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Jonathan Corbet --- Documentation/Makefile | 5 +++++ scripts/sphinx-pre-install | 46 +++++++++++++++++++++++++------------- 2 files changed, 35 insertions(+), 16 deletions(-) diff --git a/Documentation/Makefile b/Documentation/Makefile index e889e7cb8511..380e24053d6f 100644 --- a/Documentation/Makefile +++ b/Documentation/Makefile @@ -70,12 +70,14 @@ quiet_cmd_sphinx = SPHINX $@ --> file://$(abspath $(BUILDDIR)/$3/$4) $(abspath $(BUILDDIR)/$3/$4) htmldocs: + @./scripts/sphinx-pre-install --version-check @+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,html,$(var),,$(var))) linkcheckdocs: @$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,linkcheck,$(var),,$(var))) latexdocs: + @./scripts/sphinx-pre-install --version-check @+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,latex,$(var),latex,$(var))) ifeq ($(HAVE_PDFLATEX),0) @@ -87,14 +89,17 @@ pdfdocs: else # HAVE_PDFLATEX pdfdocs: latexdocs + @./scripts/sphinx-pre-install --version-check $(foreach var,$(SPHINXDIRS), $(MAKE) PDFLATEX="$(PDFLATEX)" LATEXOPTS="$(LATEXOPTS)" -C $(BUILDDIR)/$(var)/latex || exit;) endif # HAVE_PDFLATEX epubdocs: + @./scripts/sphinx-pre-install --version-check @+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,epub,$(var),epub,$(var))) xmldocs: + @./scripts/sphinx-pre-install --version-check @+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,xml,$(var),xml,$(var))) endif # HAVE_SPHINX diff --git a/scripts/sphinx-pre-install b/scripts/sphinx-pre-install index ded3e2ef3f8d..f001fc2fcf12 100755 --- a/scripts/sphinx-pre-install +++ b/scripts/sphinx-pre-install @@ -38,6 +38,7 @@ my $min_version; my $pdf = 1; my $virtualenv = 1; +my $version_check = 0; # # List of required texlive packages on Fedora and OpenSuse @@ -277,20 +278,22 @@ sub check_sphinx() die "$sphinx didn't return its version" if (!$cur_version); - printf "Sphinx version %s (minimal: %s, recommended >= %s)\n", - $cur_version, $min_version, $rec_version; - if ($cur_version lt $min_version) { - print "Warning: Sphinx version should be >= $min_version\n\n"; + printf "ERROR: Sphinx version is %s. It should be >= %s (recommended >= %s)\n", + $cur_version, $min_version, $rec_version;; $need_sphinx = 1; return; } if ($cur_version lt $rec_version) { + printf "Sphinx version %s\n", $cur_version; print "Warning: It is recommended at least Sphinx version $rec_version.\n"; - print " To upgrade, use:\n\n"; $rec_sphinx_upgrade = 1; + return; } + + # On version check mode, just assume Sphinx has all mandatory deps + exit (0) if ($version_check); } # @@ -575,14 +578,18 @@ sub check_distros() sub check_needs() { - if ($system_release) { - print "Detected OS: $system_release.\n"; - } else { - print "Unknown OS\n"; - } - # Check for needed programs/tools check_sphinx(); + + if ($system_release) { + print "Detected OS: $system_release.\n\n"; + } else { + print "Unknown OS\n\n"; + } + + print "To upgrade Sphinx, use:\n\n" if ($rec_sphinx_upgrade); + + # Check for needed programs/tools check_perl_module("Pod::Usage", 0); check_program("make", 0); check_program("gcc", 0); @@ -601,13 +608,14 @@ sub check_needs() } if ($need_sphinx || $rec_sphinx_upgrade) { my $min_activate = "$ENV{'PWD'}/${virtenv_prefix}${min_version}/bin/activate"; - my @activates = glob "$ENV{'PWD'}/${virtenv_prefix}*/bin/activate"; + my @activates = glob "$ENV{'PWD'}/${virtenv_prefix}*/bin/activate"; - @activates = sort {$b cmp $a} @activates; + @activates = sort {$b cmp $a} @activates; - if (scalar @activates > 0 && $activates[0] ge $min_activate) { - printf "\nNeed to activate virtualenv with:\n"; + if ($need_sphinx && scalar @activates > 0 && $activates[0] ge $min_activate) { + printf "\nNeed to activate a compatible Sphinx version on virtualenv with:\n"; printf "\t. $activates[0]\n"; + exit (1); } else { my $rec_activate = "$virtenv_dir/bin/activate"; my $virtualenv = findprog("virtualenv-3"); @@ -646,8 +654,14 @@ while (@ARGV) { $virtualenv = 0; } elsif ($arg eq "--no-pdf"){ $pdf = 0; + } elsif ($arg eq "--version-check"){ + $version_check = 1; } else { - print "Usage:\n\t$0 <--no-virtualenv> <--no-pdf>\n\n"; + print "Usage:\n\t$0 <--no-virtualenv> <--no-pdf> <--version-check>\n\n"; + print "Where:\n"; + print "\t--no-virtualenv\t- Recommend installing Sphinx instead of using a virtualenv\n"; + print "\t--version-check\t- if version is compatible, don't check for missing dependencies\n"; + print "\t--no-pdf\t- don't check for dependencies required to build PDF docs\n\n"; exit -1; } } From 9e78e7fc0b20bcc0d5599f71d297b6fa1a2e7c5f Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Wed, 29 May 2019 20:09:27 -0300 Subject: [PATCH 023/129] scripts/documentation-file-ref-check: better handle translations Only seek for translation renames inside the translation directory. Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Jonathan Corbet --- scripts/documentation-file-ref-check | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/scripts/documentation-file-ref-check b/scripts/documentation-file-ref-check index 63e9542656f1..6b622b88f4cf 100755 --- a/scripts/documentation-file-ref-check +++ b/scripts/documentation-file-ref-check @@ -141,6 +141,10 @@ print "Auto-fixing broken references. Please double-check the results\n"; foreach my $ref (keys %broken_ref) { my $new =$ref; + my $basedir = "."; + # On translations, only seek inside the translations directory + $basedir = $1 if ($ref =~ m,(Documentation/translations/[^/]+),); + # get just the basename $new =~ s,.*/,,; @@ -161,18 +165,18 @@ foreach my $ref (keys %broken_ref) { # usual reason for breakage: file renamed to .rst if (!$f) { $new =~ s/\.txt$/.rst/; - $f=qx(find . -iname $new) if ($new); + $f=qx(find $basedir -iname $new) if ($new); } # usual reason for breakage: use dash or underline if (!$f) { $new =~ s/[-_]/[-_]/g; - $f=qx(find . -iname $new) if ($new); + $f=qx(find $basedir -iname $new) if ($new); } # Wild guess: seek for the same name on another place if (!$f) { - $f = qx(find . -iname $new) if ($new); + $f = qx(find $basedir -iname $new) if ($new); } my @find = split /\s+/, $f; From aeaacbfed853c17b8ac5e73c21f54d7f0805d899 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Wed, 29 May 2019 20:09:28 -0300 Subject: [PATCH 024/129] scripts/documentation-file-ref-check: exclude false-positives There are at least two cases where a documentation file was gone for good, but the text still mentions it: 1) drivers/vhost/vhost.c: the reference for Documentation/virtual/lguest/lguest.c is just to give credits to the original work that vhost replaced; 2) Documentation/scsi/scsi_mid_low_api.txt: It gives credit and mentions the old Documentation/Configure.help file that used to be part of Kernel 2.4.x As we don't want to keep the script to keep pinpoint to those every time, let's add a logic at the script to allow it to ignore valid false-positives like the above. Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Jonathan Corbet --- scripts/documentation-file-ref-check | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/scripts/documentation-file-ref-check b/scripts/documentation-file-ref-check index 6b622b88f4cf..05235775cc71 100755 --- a/scripts/documentation-file-ref-check +++ b/scripts/documentation-file-ref-check @@ -8,6 +8,14 @@ use warnings; use strict; use Getopt::Long qw(:config no_auto_abbrev); +# NOTE: only add things here when the file was gone, but the text wants +# to mention a past documentation file, for example, to give credits for +# the original work. +my %false_positives = ( + "Documentation/scsi/scsi_mid_low_api.txt" => "Documentation/Configure.help", + "drivers/vhost/vhost.c" => "Documentation/virtual/lguest/lguest.c", +); + my $scriptname = $0; $scriptname =~ s,.*/([^/]+/),$1,; @@ -122,6 +130,11 @@ while () { next if (grep -e, glob("$path/$ref $path/$fulref")); } + # Discard known false-positives + if (defined($false_positives{$f})) { + next if ($false_positives{$f} eq $fulref); + } + if ($fix) { if (!($ref =~ m/(scripts|Kconfig|Kbuild)/)) { $broken_ref{$ref}++; From 4904aeed9f686c90dba72980f0067ac1a7dbbfb6 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Wed, 29 May 2019 20:09:29 -0300 Subject: [PATCH 025/129] scripts/documentation-file-ref-check: improve tools ref handling There's a false positive on perf/util: tools/perf/util/s390-cpumsf.c: Documentation/perf.data-file-format.txt The file is there at tools/perf/Documentation/, but the logic with detects relative documentation references inside tools is not capable of detecting it. So, improve it. Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Jonathan Corbet --- scripts/documentation-file-ref-check | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/documentation-file-ref-check b/scripts/documentation-file-ref-check index 05235775cc71..5d775ca7469b 100755 --- a/scripts/documentation-file-ref-check +++ b/scripts/documentation-file-ref-check @@ -127,7 +127,7 @@ while () { if ($f =~ m/tools/) { my $path = $f; $path =~ s,(.*)/.*,$1,; - next if (grep -e, glob("$path/$ref $path/$fulref")); + next if (grep -e, glob("$path/$ref $path/../$ref $path/$fulref")); } # Discard known false-positives From 0ca862e6f1c7e58e4eb9758fdb09255e6104d6a0 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Wed, 29 May 2019 20:09:30 -0300 Subject: [PATCH 026/129] scripts/documentation-file-ref-check: teach about .txt -> .yaml renames At DT, files are being renamed to jason. Teach the script how to handle such renames when used in fix mode. Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Jonathan Corbet --- scripts/documentation-file-ref-check | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/scripts/documentation-file-ref-check b/scripts/documentation-file-ref-check index 5d775ca7469b..ff16db269079 100755 --- a/scripts/documentation-file-ref-check +++ b/scripts/documentation-file-ref-check @@ -165,13 +165,22 @@ foreach my $ref (keys %broken_ref) { # usual reason for breakage: DT file moved around if ($ref =~ /devicetree/) { - my $search = $new; - $search =~ s,^.*/,,; - $f = qx(find Documentation/devicetree/ -iname "*$search*") if ($search); + # usual reason for breakage: DT file renamed to .yaml if (!$f) { - # Manufacturer name may have changed - $search =~ s/^.*,//; + my $new_ref = $ref; + $new_ref =~ s/\.txt$/.yaml/; + $f=$new_ref if (-f $new_ref); + } + + if (!$f) { + my $search = $new; + $search =~ s,^.*/,,; $f = qx(find Documentation/devicetree/ -iname "*$search*") if ($search); + if (!$f) { + # Manufacturer name may have changed + $search =~ s/^.*,//; + $f = qx(find Documentation/devicetree/ -iname "*$search*") if ($search); + } } } From cf08508d21ffae5aea6c7dcb771ebd28612c6120 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Wed, 29 May 2019 20:09:31 -0300 Subject: [PATCH 027/129] docs: by default, build docs a lot faster with Sphinx >= 1.7 Since Sphinx version 1.7, it is possible to use "-jauto" in order to speedup documentation builds. On older versions, while -j was already supported, one would need to set the number of threads manually. So, if SPHINXOPTS is not provided, add -jauto, in order to speed up the build. That makes it *a lot* times faster than without -j. If one really wants to slow things down, it can just use: make SPHINXOPTS=-j1 htmldocs Signed-off-by: Mauro Carvalho Chehab [ jc: fixed perl magic to determine sphinx version ] Signed-off-by: Jonathan Corbet --- Documentation/Makefile | 2 ++ 1 file changed, 2 insertions(+) diff --git a/Documentation/Makefile b/Documentation/Makefile index 380e24053d6f..85d3cfafd77c 100644 --- a/Documentation/Makefile +++ b/Documentation/Makefile @@ -28,6 +28,8 @@ ifeq ($(HAVE_SPHINX),0) else # HAVE_SPHINX +export SPHINXOPTS = $(shell perl -e 'open IN,"sphinx-build --version 2>&1 |"; while () { if (m/([\d\.]+)/) { print "-jauto" if ($$1 >= "1.7") } ;} close IN') + # User-friendly check for pdflatex and latexmk HAVE_PDFLATEX := $(shell if which $(PDFLATEX) >/dev/null 2>&1; then echo 1; else echo 0; fi) HAVE_LATEXMK := $(shell if which latexmk >/dev/null 2>&1; then echo 1; else echo 0; fi) From a700767a7682d9bd237e927253274859aee075e7 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Wed, 29 May 2019 20:09:32 -0300 Subject: [PATCH 028/129] docs: requirements.txt: recommend Sphinx 1.7.9 As discussed at the linux-doc ML, while we'll still support version 1.3, it is time to recommend a more modern version. So, let's switch the minimal requirements to Sphinx 1.7.9, as it has the "-jauto" flag, with makes a lot faster when building documentation. Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Jonathan Corbet --- Documentation/doc-guide/sphinx.rst | 17 ++++++++--------- Documentation/sphinx/requirements.txt | 4 ++-- 2 files changed, 10 insertions(+), 11 deletions(-) diff --git a/Documentation/doc-guide/sphinx.rst b/Documentation/doc-guide/sphinx.rst index c039224b404e..4ba081f43e98 100644 --- a/Documentation/doc-guide/sphinx.rst +++ b/Documentation/doc-guide/sphinx.rst @@ -27,8 +27,7 @@ Sphinx Install ============== The ReST markups currently used by the Documentation/ files are meant to be -built with ``Sphinx`` version 1.3 or higher. If you desire to build -PDF output, it is recommended to use version 1.4.6 or higher. +built with ``Sphinx`` version 1.3 or higher. There's a script that checks for the Sphinx requirements. Please see :ref:`sphinx-pre-install` for further details. @@ -56,13 +55,13 @@ or ``virtualenv``, depending on how your distribution packaged Python 3. those expressions are written using LaTeX notation. It needs texlive installed with amdfonts and amsmath in order to evaluate them. -In summary, if you want to install Sphinx version 1.4.9, you should do:: +In summary, if you want to install Sphinx version 1.7.9, you should do:: - $ virtualenv sphinx_1.4 - $ . sphinx_1.4/bin/activate - (sphinx_1.4) $ pip install -r Documentation/sphinx/requirements.txt + $ virtualenv sphinx_1.7.9 + $ . sphinx_1.7.9/bin/activate + (sphinx_1.7.9) $ pip install -r Documentation/sphinx/requirements.txt -After running ``. sphinx_1.4/bin/activate``, the prompt will change, +After running ``. sphinx_1.7.9/bin/activate``, the prompt will change, in order to indicate that you're using the new environment. If you open a new shell, you need to rerun this command to enter again at the virtual environment before building the documentation. @@ -105,8 +104,8 @@ command line options for your distro:: You should run: sudo dnf install -y texlive-luatex85 - /usr/bin/virtualenv sphinx_1.4 - . sphinx_1.4/bin/activate + /usr/bin/virtualenv sphinx_1.7.9 + . sphinx_1.7.9/bin/activate pip install -r Documentation/sphinx/requirements.txt Can't build as 1 mandatory dependency is missing at ./scripts/sphinx-pre-install line 468. diff --git a/Documentation/sphinx/requirements.txt b/Documentation/sphinx/requirements.txt index 742be3e12619..14e29a0ae480 100644 --- a/Documentation/sphinx/requirements.txt +++ b/Documentation/sphinx/requirements.txt @@ -1,3 +1,3 @@ -docutils==0.12 -Sphinx==1.4.9 +docutils +Sphinx==1.7.9 sphinx_rtd_theme From 6c01edd395a7cc7bb82333e953992eb0e76b1c35 Mon Sep 17 00:00:00 2001 From: Jonathan Corbet Date: Fri, 31 May 2019 10:02:11 -0600 Subject: [PATCH 029/129] docs: look for sphinx-pre-install in the source tree Recent makefile changes included an invocation of ./scripts/sphinx-pre-install. Unfortunately, that fails when a separate build directory is in use with: /bin/bash: ./scripts/sphinx-pre-install: No such file or directory Use $(srctree) to fully specify the location of this script. Signed-off-by: Jonathan Corbet --- Documentation/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/Makefile b/Documentation/Makefile index 85d3cfafd77c..2edd03b1dad6 100644 --- a/Documentation/Makefile +++ b/Documentation/Makefile @@ -23,7 +23,7 @@ ifeq ($(HAVE_SPHINX),0) .DEFAULT: $(warning The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed and in PATH, or set the SPHINXBUILD make variable to point to the full path of the '$(SPHINXBUILD)' executable.) @echo - @./scripts/sphinx-pre-install + @$(srctree)/scripts/sphinx-pre-install @echo " SKIP Sphinx $@ target." else # HAVE_SPHINX From 18e1572419d69f8d45248cccabc40352a3e281d6 Mon Sep 17 00:00:00 2001 From: Jonathan Corbet Date: Tue, 4 Jun 2019 07:55:49 -0600 Subject: [PATCH 030/129] docs: Completely fix the remote build tree case My previous fix miserably failed to catch all of the invocations of "./scripts/sphinx-pre-install", so we got build errors. Try again with more caffeine. Reported-by: kbuild test robot Signed-off-by: Jonathan Corbet --- Documentation/Makefile | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/Documentation/Makefile b/Documentation/Makefile index 2edd03b1dad6..2df0789f90b7 100644 --- a/Documentation/Makefile +++ b/Documentation/Makefile @@ -72,14 +72,14 @@ quiet_cmd_sphinx = SPHINX $@ --> file://$(abspath $(BUILDDIR)/$3/$4) $(abspath $(BUILDDIR)/$3/$4) htmldocs: - @./scripts/sphinx-pre-install --version-check + @$(srctree)/scripts/sphinx-pre-install --version-check @+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,html,$(var),,$(var))) linkcheckdocs: @$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,linkcheck,$(var),,$(var))) latexdocs: - @./scripts/sphinx-pre-install --version-check + @$(srctree)/scripts/sphinx-pre-install --version-check @+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,latex,$(var),latex,$(var))) ifeq ($(HAVE_PDFLATEX),0) @@ -91,17 +91,17 @@ pdfdocs: else # HAVE_PDFLATEX pdfdocs: latexdocs - @./scripts/sphinx-pre-install --version-check + @$(srctree)/scripts/sphinx-pre-install --version-check $(foreach var,$(SPHINXDIRS), $(MAKE) PDFLATEX="$(PDFLATEX)" LATEXOPTS="$(LATEXOPTS)" -C $(BUILDDIR)/$(var)/latex || exit;) endif # HAVE_PDFLATEX epubdocs: - @./scripts/sphinx-pre-install --version-check + @$(srctree)/scripts/sphinx-pre-install --version-check @+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,epub,$(var),epub,$(var))) xmldocs: - @./scripts/sphinx-pre-install --version-check + @$(srctree)/scripts/sphinx-pre-install --version-check @+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,xml,$(var),xml,$(var))) endif # HAVE_SPHINX From ee5dc0491c38ae4e4e583d7532d470754bb173f6 Mon Sep 17 00:00:00 2001 From: "Tobin C. Harding" Date: Tue, 4 Jun 2019 10:26:56 +1000 Subject: [PATCH 031/129] docs: filesystems: vfs: Render method descriptions Currently vfs.rst does not render well into HTML the method descriptions for VFS data structures. We can improve the HTML output by putting the description string on a new line following the method name. Suggested-by: Jonathan Corbet Signed-off-by: Tobin C. Harding Signed-off-by: Jonathan Corbet --- Documentation/filesystems/vfs.rst | 1061 ++++++++++++++++------------- 1 file changed, 599 insertions(+), 462 deletions(-) diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst index 2ffbdf5f392c..0f85ab21c2ca 100644 --- a/Documentation/filesystems/vfs.rst +++ b/Documentation/filesystems/vfs.rst @@ -125,35 +125,46 @@ members are defined: struct lock_class_key s_umount_key; }; -``name``: the name of the filesystem type, such as "ext2", "iso9660", +``name`` + the name of the filesystem type, such as "ext2", "iso9660", "msdos" and so on -``fs_flags``: various flags (i.e. FS_REQUIRES_DEV, FS_NO_DCACHE, etc.) +``fs_flags`` + various flags (i.e. FS_REQUIRES_DEV, FS_NO_DCACHE, etc.) -``mount``: the method to call when a new instance of this filesystem should -be mounted +``mount`` + the method to call when a new instance of this filesystem should + be mounted -``kill_sb``: the method to call when an instance of this filesystem - should be shut down +``kill_sb`` + the method to call when an instance of this filesystem should be + shut down -``owner``: for internal VFS use: you should initialize this to THIS_MODULE in - most cases. -``next``: for internal VFS use: you should initialize this to NULL +``owner`` + for internal VFS use: you should initialize this to THIS_MODULE + in most cases. + +``next`` + for internal VFS use: you should initialize this to NULL s_lock_key, s_umount_key: lockdep-specific The mount() method has the following arguments: -``struct file_system_type *fs_type``: describes the filesystem, partly initialized - by the specific filesystem code +``struct file_system_type *fs_type`` + describes the filesystem, partly initialized by the specific + filesystem code -``int flags``: mount flags +``int flags`` + mount flags -``const char *dev_name``: the device name we are mounting. +``const char *dev_name`` + the device name we are mounting. -``void *data``: arbitrary mount options, usually comes as an ASCII - string (see "Mount Options" section) +``void *data`` + arbitrary mount options, usually comes as an ASCII string (see + "Mount Options" section) The mount() method must return the root dentry of the tree requested by caller. An active reference to its superblock must be grabbed and the @@ -178,22 +189,27 @@ implementation. Usually, a filesystem uses one of the generic mount() implementations and provides a fill_super() callback instead. The generic variants are: -``mount_bdev``: mount a filesystem residing on a block device +``mount_bdev`` + mount a filesystem residing on a block device -``mount_nodev``: mount a filesystem that is not backed by a device +``mount_nodev`` + mount a filesystem that is not backed by a device -``mount_single``: mount a filesystem which shares the instance between - all mounts +``mount_single`` + mount a filesystem which shares the instance between all mounts A fill_super() callback implementation has the following arguments: -``struct super_block *sb``: the superblock structure. The callback - must initialize this properly. +``struct super_block *sb`` + the superblock structure. The callback must initialize this + properly. -``void *data``: arbitrary mount options, usually comes as an ASCII - string (see "Mount Options" section) +``void *data`` + arbitrary mount options, usually comes as an ASCII string (see + "Mount Options" section) -``int silent``: whether or not to be silent on error +``int silent`` + whether or not to be silent on error The Superblock Object @@ -240,87 +256,106 @@ noted. This means that most methods can block safely. All methods are only called from a process context (i.e. not from an interrupt handler or bottom half). -``alloc_inode``: this method is called by alloc_inode() to allocate memory - for struct inode and initialize it. If this function is not +``alloc_inode`` + this method is called by alloc_inode() to allocate memory for + struct inode and initialize it. If this function is not defined, a simple 'struct inode' is allocated. Normally alloc_inode will be used to allocate a larger structure which contains a 'struct inode' embedded within it. -``destroy_inode``: this method is called by destroy_inode() to release - resources allocated for struct inode. It is only required if +``destroy_inode`` + this method is called by destroy_inode() to release resources + allocated for struct inode. It is only required if ->alloc_inode was defined and simply undoes anything done by ->alloc_inode. -``dirty_inode``: this method is called by the VFS to mark an inode dirty. +``dirty_inode`` + this method is called by the VFS to mark an inode dirty. -``write_inode``: this method is called when the VFS needs to write an - inode to disc. The second parameter indicates whether the write - should be synchronous or not, not all filesystems check this flag. +``write_inode`` + this method is called when the VFS needs to write an inode to + disc. The second parameter indicates whether the write should + be synchronous or not, not all filesystems check this flag. -``drop_inode``: called when the last access to the inode is dropped, - with the inode->i_lock spinlock held. +``drop_inode`` + called when the last access to the inode is dropped, with the + inode->i_lock spinlock held. This method should be either NULL (normal UNIX filesystem - semantics) or "generic_delete_inode" (for filesystems that do not - want to cache inodes - causing "delete_inode" to always be + semantics) or "generic_delete_inode" (for filesystems that do + not want to cache inodes - causing "delete_inode" to always be called regardless of the value of i_nlink) - The "generic_delete_inode()" behavior is equivalent to the - old practice of using "force_delete" in the put_inode() case, - but does not have the races that the "force_delete()" approach - had. + The "generic_delete_inode()" behavior is equivalent to the old + practice of using "force_delete" in the put_inode() case, but + does not have the races that the "force_delete()" approach had. -``delete_inode``: called when the VFS wants to delete an inode +``delete_inode`` + called when the VFS wants to delete an inode -``put_super``: called when the VFS wishes to free the superblock +``put_super`` + called when the VFS wishes to free the superblock (i.e. unmount). This is called with the superblock lock held -``sync_fs``: called when VFS is writing out all dirty data associated with - a superblock. The second parameter indicates whether the method +``sync_fs`` + called when VFS is writing out all dirty data associated with a + superblock. The second parameter indicates whether the method should wait until the write out has been completed. Optional. -``freeze_fs``: called when VFS is locking a filesystem and - forcing it into a consistent state. This method is currently - used by the Logical Volume Manager (LVM). +``freeze_fs`` + called when VFS is locking a filesystem and forcing it into a + consistent state. This method is currently used by the Logical + Volume Manager (LVM). -``unfreeze_fs``: called when VFS is unlocking a filesystem and making it writable +``unfreeze_fs`` + called when VFS is unlocking a filesystem and making it writable again. -``statfs``: called when the VFS needs to get filesystem statistics. +``statfs`` + called when the VFS needs to get filesystem statistics. -``remount_fs``: called when the filesystem is remounted. This is called - with the kernel lock held +``remount_fs`` + called when the filesystem is remounted. This is called with + the kernel lock held -``clear_inode``: called then the VFS clears the inode. Optional +``clear_inode`` + called then the VFS clears the inode. Optional -``umount_begin``: called when the VFS is unmounting a filesystem. +``umount_begin`` + called when the VFS is unmounting a filesystem. -``show_options``: called by the VFS to show mount options for - /proc//mounts. (see "Mount Options" section) +``show_options`` + called by the VFS to show mount options for /proc//mounts. + (see "Mount Options" section) -``quota_read``: called by the VFS to read from filesystem quota file. +``quota_read`` + called by the VFS to read from filesystem quota file. -``quota_write``: called by the VFS to write to filesystem quota file. +``quota_write`` + called by the VFS to write to filesystem quota file. -``nr_cached_objects``: called by the sb cache shrinking function for the - filesystem to return the number of freeable cached objects it contains. +``nr_cached_objects`` + called by the sb cache shrinking function for the filesystem to + return the number of freeable cached objects it contains. Optional. -``free_cache_objects``: called by the sb cache shrinking function for the - filesystem to scan the number of objects indicated to try to free them. - Optional, but any filesystem implementing this method needs to also - implement ->nr_cached_objects for it to be called correctly. +``free_cache_objects`` + called by the sb cache shrinking function for the filesystem to + scan the number of objects indicated to try to free them. + Optional, but any filesystem implementing this method needs to + also implement ->nr_cached_objects for it to be called + correctly. We can't do anything with any errors that the filesystem might - encountered, hence the void return type. This will never be called if - the VM is trying to reclaim under GFP_NOFS conditions, hence this - method does not need to handle that situation itself. + encountered, hence the void return type. This will never be + called if the VM is trying to reclaim under GFP_NOFS conditions, + hence this method does not need to handle that situation itself. - Implementations must include conditional reschedule calls inside any - scanning loop that is done. This allows the VFS to determine - appropriate scan batch sizes without having to worry about whether - implementations will cause holdoff problems due to large scan batch - sizes. + Implementations must include conditional reschedule calls inside + any scanning loop that is done. This allows the VFS to + determine appropriate scan batch sizes without having to worry + about whether implementations will cause holdoff problems due to + large scan batch sizes. Whoever sets up the inode is responsible for filling in the "i_op" field. This is a pointer to a "struct inode_operations" which describes @@ -334,23 +369,31 @@ On filesystems that support extended attributes (xattrs), the s_xattr superblock field points to a NULL-terminated array of xattr handlers. Extended attributes are name:value pairs. -``name``: Indicates that the handler matches attributes with the specified name - (such as "system.posix_acl_access"); the prefix field must be NULL. +``name`` + Indicates that the handler matches attributes with the specified + name (such as "system.posix_acl_access"); the prefix field must + be NULL. -``prefix``: Indicates that the handler matches all attributes with the specified - name prefix (such as "user."); the name field must be NULL. +``prefix`` + Indicates that the handler matches all attributes with the + specified name prefix (such as "user."); the name field must be + NULL. -``list``: Determine if attributes matching this xattr handler should be listed - for a particular dentry. Used by some listxattr implementations like - generic_listxattr. +``list`` + Determine if attributes matching this xattr handler should be + listed for a particular dentry. Used by some listxattr + implementations like generic_listxattr. -``get``: Called by the VFS to get the value of a particular extended attribute. - This method is called by the getxattr(2) system call. +``get`` + Called by the VFS to get the value of a particular extended + attribute. This method is called by the getxattr(2) system + call. -``set``: Called by the VFS to set the value of a particular extended attribute. - When the new value is NULL, called to remove a particular extended - attribute. This method is called by the the setxattr(2) and - removexattr(2) system calls. +``set`` + Called by the VFS to set the value of a particular extended + attribute. When the new value is NULL, called to remove a + particular extended attribute. This method is called by the the + setxattr(2) and removexattr(2) system calls. When none of the xattr handlers of a filesystem match the specified attribute name or when a filesystem doesn't support extended attributes, @@ -399,128 +442,147 @@ As of kernel 2.6.22, the following members are defined: Again, all methods are called without any locks being held, unless otherwise noted. -``create``: called by the open(2) and creat(2) system calls. Only - required if you want to support regular files. The dentry you - get should not have an inode (i.e. it should be a negative - dentry). Here you will probably call d_instantiate() with the - dentry and the newly created inode +``create`` + called by the open(2) and creat(2) system calls. Only required + if you want to support regular files. The dentry you get should + not have an inode (i.e. it should be a negative dentry). Here + you will probably call d_instantiate() with the dentry and the + newly created inode -``lookup``: called when the VFS needs to look up an inode in a parent +``lookup`` + called when the VFS needs to look up an inode in a parent directory. The name to look for is found in the dentry. This method must call d_add() to insert the found inode into the dentry. The "i_count" field in the inode structure should be incremented. If the named inode does not exist a NULL inode should be inserted into the dentry (this is called a negative - dentry). Returning an error code from this routine must only - be done on a real error, otherwise creating inodes with system + dentry). Returning an error code from this routine must only be + done on a real error, otherwise creating inodes with system calls like create(2), mknod(2), mkdir(2) and so on will fail. If you wish to overload the dentry methods then you should - initialise the "d_dop" field in the dentry; this is a pointer - to a struct "dentry_operations". - This method is called with the directory inode semaphore held + initialise the "d_dop" field in the dentry; this is a pointer to + a struct "dentry_operations". This method is called with the + directory inode semaphore held -``link``: called by the link(2) system call. Only required if you want - to support hard links. You will probably need to call +``link`` + called by the link(2) system call. Only required if you want to + support hard links. You will probably need to call d_instantiate() just as you would in the create() method -``unlink``: called by the unlink(2) system call. Only required if you - want to support deleting inodes +``unlink`` + called by the unlink(2) system call. Only required if you want + to support deleting inodes -``symlink``: called by the symlink(2) system call. Only required if you - want to support symlinks. You will probably need to call +``symlink`` + called by the symlink(2) system call. Only required if you want + to support symlinks. You will probably need to call d_instantiate() just as you would in the create() method -``mkdir``: called by the mkdir(2) system call. Only required if you want +``mkdir`` + called by the mkdir(2) system call. Only required if you want to support creating subdirectories. You will probably need to call d_instantiate() just as you would in the create() method -``rmdir``: called by the rmdir(2) system call. Only required if you want +``rmdir`` + called by the rmdir(2) system call. Only required if you want to support deleting subdirectories -``mknod``: called by the mknod(2) system call to create a device (char, - block) inode or a named pipe (FIFO) or socket. Only required - if you want to support creating these types of inodes. You - will probably need to call d_instantiate() just as you would - in the create() method +``mknod`` + called by the mknod(2) system call to create a device (char, + block) inode or a named pipe (FIFO) or socket. Only required if + you want to support creating these types of inodes. You will + probably need to call d_instantiate() just as you would in the + create() method -``rename``: called by the rename(2) system call to rename the object to - have the parent and name given by the second inode and dentry. +``rename`` + called by the rename(2) system call to rename the object to have + the parent and name given by the second inode and dentry. The filesystem must return -EINVAL for any unsupported or - unknown flags. Currently the following flags are implemented: - (1) RENAME_NOREPLACE: this flag indicates that if the target - of the rename exists the rename should fail with -EEXIST - instead of replacing the target. The VFS already checks for - existence, so for local filesystems the RENAME_NOREPLACE - implementation is equivalent to plain rename. + unknown flags. Currently the following flags are implemented: + (1) RENAME_NOREPLACE: this flag indicates that if the target of + the rename exists the rename should fail with -EEXIST instead of + replacing the target. The VFS already checks for existence, so + for local filesystems the RENAME_NOREPLACE implementation is + equivalent to plain rename. (2) RENAME_EXCHANGE: exchange source and target. Both must - exist; this is checked by the VFS. Unlike plain rename, - source and target may be of different type. + exist; this is checked by the VFS. Unlike plain rename, source + and target may be of different type. -``get_link``: called by the VFS to follow a symbolic link to the - inode it points to. Only required if you want to support - symbolic links. This method returns the symlink body - to traverse (and possibly resets the current position with - nd_jump_link()). If the body won't go away until the inode - is gone, nothing else is needed; if it needs to be otherwise - pinned, arrange for its release by having get_link(..., ..., done) - do set_delayed_call(done, destructor, argument). - In that case destructor(argument) will be called once VFS is - done with the body you've returned. - May be called in RCU mode; that is indicated by NULL dentry +``get_link`` + called by the VFS to follow a symbolic link to the inode it + points to. Only required if you want to support symbolic links. + This method returns the symlink body to traverse (and possibly + resets the current position with nd_jump_link()). If the body + won't go away until the inode is gone, nothing else is needed; + if it needs to be otherwise pinned, arrange for its release by + having get_link(..., ..., done) do set_delayed_call(done, + destructor, argument). In that case destructor(argument) will + be called once VFS is done with the body you've returned. May + be called in RCU mode; that is indicated by NULL dentry argument. If request can't be handled without leaving RCU mode, have it return ERR_PTR(-ECHILD). - If the filesystem stores the symlink target in ->i_link, the VFS may use it directly without calling ->get_link(); however, ->get_link() must still be provided. ->i_link must not be freed until after an RCU grace period. Writing to ->i_link post-iget() time requires a 'release' memory barrier. -``readlink``: this is now just an override for use by readlink(2) for the +``readlink`` + this is now just an override for use by readlink(2) for the cases when ->get_link uses nd_jump_link() or object is not in fact a symlink. Normally filesystems should only implement ->get_link for symlinks and readlink(2) will automatically use that. -``permission``: called by the VFS to check for access rights on a POSIX-like +``permission`` + called by the VFS to check for access rights on a POSIX-like filesystem. - May be called in rcu-walk mode (mask & MAY_NOT_BLOCK). If in rcu-walk - mode, the filesystem must check the permission without blocking or - storing to the inode. + May be called in rcu-walk mode (mask & MAY_NOT_BLOCK). If in + rcu-walk mode, the filesystem must check the permission without + blocking or storing to the inode. - If a situation is encountered that rcu-walk cannot handle, return + If a situation is encountered that rcu-walk cannot handle, + return -ECHILD and it will be called again in ref-walk mode. -``setattr``: called by the VFS to set attributes for a file. This method - is called by chmod(2) and related system calls. +``setattr`` + called by the VFS to set attributes for a file. This method is + called by chmod(2) and related system calls. -``getattr``: called by the VFS to get attributes of a file. This method - is called by stat(2) and related system calls. +``getattr`` + called by the VFS to get attributes of a file. This method is + called by stat(2) and related system calls. -``listxattr``: called by the VFS to list all extended attributes for a - given file. This method is called by the listxattr(2) system call. +``listxattr`` + called by the VFS to list all extended attributes for a given + file. This method is called by the listxattr(2) system call. -``update_time``: called by the VFS to update a specific time or the i_version of - an inode. If this is not defined the VFS will update the inode itself - and call mark_inode_dirty_sync. +``update_time`` + called by the VFS to update a specific time or the i_version of + an inode. If this is not defined the VFS will update the inode + itself and call mark_inode_dirty_sync. -``atomic_open``: called on the last component of an open. Using this optional - method the filesystem can look up, possibly create and open the file in - one atomic operation. If it wants to leave actual opening to the - caller (e.g. if the file turned out to be a symlink, device, or just - something filesystem won't do atomic open for), it may signal this by - returning finish_no_open(file, dentry). This method is only called if - the last component is negative or needs lookup. Cached positive dentries - are still handled by f_op->open(). If the file was created, - FMODE_CREATED flag should be set in file->f_mode. In case of O_EXCL - the method must only succeed if the file didn't exist and hence FMODE_CREATED - shall always be set on success. +``atomic_open`` + called on the last component of an open. Using this optional + method the filesystem can look up, possibly create and open the + file in one atomic operation. If it wants to leave actual + opening to the caller (e.g. if the file turned out to be a + symlink, device, or just something filesystem won't do atomic + open for), it may signal this by returning finish_no_open(file, + dentry). This method is only called if the last component is + negative or needs lookup. Cached positive dentries are still + handled by f_op->open(). If the file was created, FMODE_CREATED + flag should be set in file->f_mode. In case of O_EXCL the + method must only succeed if the file didn't exist and hence + FMODE_CREATED shall always be set on success. -``tmpfile``: called in the end of O_TMPFILE open(). Optional, equivalent to - atomically creating, opening and unlinking a file in given directory. +``tmpfile`` + called in the end of O_TMPFILE open(). Optional, equivalent to + atomically creating, opening and unlinking a file in given + directory. The Address Space Object @@ -673,70 +735,75 @@ cache in your filesystem. The following members are defined: int (*swap_deactivate)(struct file *); }; -``writepage``: called by the VM to write a dirty page to backing store. - This may happen for data integrity reasons (i.e. 'sync'), or - to free up memory (flush). The difference can be seen in - wbc->sync_mode. - The PG_Dirty flag has been cleared and PageLocked is true. - writepage should start writeout, should set PG_Writeback, - and should make sure the page is unlocked, either synchronously - or asynchronously when the write operation completes. +``writepage`` + called by the VM to write a dirty page to backing store. This + may happen for data integrity reasons (i.e. 'sync'), or to free + up memory (flush). The difference can be seen in + wbc->sync_mode. The PG_Dirty flag has been cleared and + PageLocked is true. writepage should start writeout, should set + PG_Writeback, and should make sure the page is unlocked, either + synchronously or asynchronously when the write operation + completes. - If wbc->sync_mode is WB_SYNC_NONE, ->writepage doesn't have to - try too hard if there are problems, and may choose to write out - other pages from the mapping if that is easier (e.g. due to - internal dependencies). If it chooses not to start writeout, it - should return AOP_WRITEPAGE_ACTIVATE so that the VM will not keep - calling ->writepage on that page. + If wbc->sync_mode is WB_SYNC_NONE, ->writepage doesn't have to + try too hard if there are problems, and may choose to write out + other pages from the mapping if that is easier (e.g. due to + internal dependencies). If it chooses not to start writeout, it + should return AOP_WRITEPAGE_ACTIVATE so that the VM will not + keep calling ->writepage on that page. - See the file "Locking" for more details. + See the file "Locking" for more details. -``readpage``: called by the VM to read a page from backing store. - The page will be Locked when readpage is called, and should be - unlocked and marked uptodate once the read completes. - If ->readpage discovers that it needs to unlock the page for - some reason, it can do so, and then return AOP_TRUNCATED_PAGE. - In this case, the page will be relocated, relocked and if - that all succeeds, ->readpage will be called again. +``readpage`` + called by the VM to read a page from backing store. The page + will be Locked when readpage is called, and should be unlocked + and marked uptodate once the read completes. If ->readpage + discovers that it needs to unlock the page for some reason, it + can do so, and then return AOP_TRUNCATED_PAGE. In this case, + the page will be relocated, relocked and if that all succeeds, + ->readpage will be called again. -``writepages``: called by the VM to write out pages associated with the +``writepages`` + called by the VM to write out pages associated with the address_space object. If wbc->sync_mode is WBC_SYNC_ALL, then the writeback_control will specify a range of pages that must be - written out. If it is WBC_SYNC_NONE, then a nr_to_write is given - and that many pages should be written if possible. - If no ->writepages is given, then mpage_writepages is used - instead. This will choose pages from the address space that are - tagged as DIRTY and will pass them to ->writepage. + written out. If it is WBC_SYNC_NONE, then a nr_to_write is + given and that many pages should be written if possible. If no + ->writepages is given, then mpage_writepages is used instead. + This will choose pages from the address space that are tagged as + DIRTY and will pass them to ->writepage. -``set_page_dirty``: called by the VM to set a page dirty. - This is particularly needed if an address space attaches - private data to a page, and that data needs to be updated when - a page is dirtied. This is called, for example, when a memory - mapped page gets modified. +``set_page_dirty`` + called by the VM to set a page dirty. This is particularly + needed if an address space attaches private data to a page, and + that data needs to be updated when a page is dirtied. This is + called, for example, when a memory mapped page gets modified. If defined, it should set the PageDirty flag, and the PAGECACHE_TAG_DIRTY tag in the radix tree. -``readpages``: called by the VM to read pages associated with the address_space - object. This is essentially just a vector version of - readpage. Instead of just one page, several pages are - requested. +``readpages`` + called by the VM to read pages associated with the address_space + object. This is essentially just a vector version of readpage. + Instead of just one page, several pages are requested. readpages is only used for read-ahead, so read errors are ignored. If anything goes wrong, feel free to give up. -``write_begin``: - Called by the generic buffered write code to ask the filesystem to - prepare to write len bytes at the given offset in the file. The - address_space should check that the write will be able to complete, - by allocating space if necessary and doing any other internal - housekeeping. If the write will update parts of any basic-blocks on - storage, then those blocks should be pre-read (if they haven't been - read already) so that the updated blocks can be written out properly. +``write_begin`` + Called by the generic buffered write code to ask the filesystem + to prepare to write len bytes at the given offset in the file. + The address_space should check that the write will be able to + complete, by allocating space if necessary and doing any other + internal housekeeping. If the write will update parts of any + basic-blocks on storage, then those blocks should be pre-read + (if they haven't been read already) so that the updated blocks + can be written out properly. - The filesystem must return the locked pagecache page for the specified - offset, in ``*pagep``, for the caller to write into. + The filesystem must return the locked pagecache page for the + specified offset, in ``*pagep``, for the caller to write into. - It must be able to cope with short writes (where the length passed to - write_begin is greater than the number of bytes copied into the page). + It must be able to cope with short writes (where the length + passed to write_begin is greater than the number of bytes copied + into the page). flags is a field for AOP_FLAG_xxx flags, described in include/linux/fs.h. @@ -744,114 +811,128 @@ cache in your filesystem. The following members are defined: A void * may be returned in fsdata, which then gets passed into write_end. - Returns 0 on success; < 0 on failure (which is the error code), in - which case write_end is not called. + Returns 0 on success; < 0 on failure (which is the error code), + in which case write_end is not called. -``write_end``: After a successful write_begin, and data copy, write_end must - be called. len is the original len passed to write_begin, and copied - is the amount that was able to be copied. +``write_end`` + After a successful write_begin, and data copy, write_end must be + called. len is the original len passed to write_begin, and + copied is the amount that was able to be copied. - The filesystem must take care of unlocking the page and releasing it - refcount, and updating i_size. + The filesystem must take care of unlocking the page and + releasing it refcount, and updating i_size. - Returns < 0 on failure, otherwise the number of bytes (<= 'copied') - that were able to be copied into pagecache. + Returns < 0 on failure, otherwise the number of bytes (<= + 'copied') that were able to be copied into pagecache. -``bmap``: called by the VFS to map a logical block offset within object to - physical block number. This method is used by the FIBMAP - ioctl and for working with swap-files. To be able to swap to - a file, the file must have a stable mapping to a block - device. The swap system does not go through the filesystem - but instead uses bmap to find out where the blocks in the file - are and uses those addresses directly. +``bmap`` + called by the VFS to map a logical block offset within object to + physical block number. This method is used by the FIBMAP ioctl + and for working with swap-files. To be able to swap to a file, + the file must have a stable mapping to a block device. The swap + system does not go through the filesystem but instead uses bmap + to find out where the blocks in the file are and uses those + addresses directly. -``invalidatepage``: If a page has PagePrivate set, then invalidatepage - will be called when part or all of the page is to be removed - from the address space. This generally corresponds to either a - truncation, punch hole or a complete invalidation of the address +``invalidatepage`` + If a page has PagePrivate set, then invalidatepage will be + called when part or all of the page is to be removed from the + address space. This generally corresponds to either a + truncation, punch hole or a complete invalidation of the address space (in the latter case 'offset' will always be 0 and 'length' will be PAGE_SIZE). Any private data associated with the page - should be updated to reflect this truncation. If offset is 0 and - length is PAGE_SIZE, then the private data should be released, - because the page must be able to be completely discarded. This may - be done by calling the ->releasepage function, but in this case the - release MUST succeed. + should be updated to reflect this truncation. If offset is 0 + and length is PAGE_SIZE, then the private data should be + released, because the page must be able to be completely + discarded. This may be done by calling the ->releasepage + function, but in this case the release MUST succeed. -``releasepage``: releasepage is called on PagePrivate pages to indicate - that the page should be freed if possible. ->releasepage - should remove any private data from the page and clear the - PagePrivate flag. If releasepage() fails for some reason, it must - indicate failure with a 0 return value. - releasepage() is used in two distinct though related cases. The - first is when the VM finds a clean page with no active users and - wants to make it a free page. If ->releasepage succeeds, the - page will be removed from the address_space and become free. +``releasepage`` + releasepage is called on PagePrivate pages to indicate that the + page should be freed if possible. ->releasepage should remove + any private data from the page and clear the PagePrivate flag. + If releasepage() fails for some reason, it must indicate failure + with a 0 return value. releasepage() is used in two distinct + though related cases. The first is when the VM finds a clean + page with no active users and wants to make it a free page. If + ->releasepage succeeds, the page will be removed from the + address_space and become free. The second case is when a request has been made to invalidate - some or all pages in an address_space. This can happen - through the fadvise(POSIX_FADV_DONTNEED) system call or by the - filesystem explicitly requesting it as nfs and 9fs do (when - they believe the cache may be out of date with storage) by - calling invalidate_inode_pages2(). - If the filesystem makes such a call, and needs to be certain - that all pages are invalidated, then its releasepage will - need to ensure this. Possibly it can clear the PageUptodate - bit if it cannot free private data yet. + some or all pages in an address_space. This can happen through + the fadvise(POSIX_FADV_DONTNEED) system call or by the + filesystem explicitly requesting it as nfs and 9fs do (when they + believe the cache may be out of date with storage) by calling + invalidate_inode_pages2(). If the filesystem makes such a call, + and needs to be certain that all pages are invalidated, then its + releasepage will need to ensure this. Possibly it can clear the + PageUptodate bit if it cannot free private data yet. -``freepage``: freepage is called once the page is no longer visible in - the page cache in order to allow the cleanup of any private - data. Since it may be called by the memory reclaimer, it - should not assume that the original address_space mapping still - exists, and it should not block. +``freepage`` + freepage is called once the page is no longer visible in the + page cache in order to allow the cleanup of any private data. + Since it may be called by the memory reclaimer, it should not + assume that the original address_space mapping still exists, and + it should not block. -``direct_IO``: called by the generic read/write routines to perform - direct_IO - that is IO requests which bypass the page cache - and transfer data directly between the storage and the - application's address space. +``direct_IO`` + called by the generic read/write routines to perform direct_IO - + that is IO requests which bypass the page cache and transfer + data directly between the storage and the application's address + space. -``isolate_page``: Called by the VM when isolating a movable non-lru page. - If page is successfully isolated, VM marks the page as PG_isolated - via __SetPageIsolated. +``isolate_page`` + Called by the VM when isolating a movable non-lru page. If page + is successfully isolated, VM marks the page as PG_isolated via + __SetPageIsolated. -``migrate_page``: This is used to compact the physical memory usage. - If the VM wants to relocate a page (maybe off a memory card - that is signalling imminent failure) it will pass a new page - and an old page to this function. migrate_page should - transfer any private data across and update any references - that it has to the page. +``migrate_page`` + This is used to compact the physical memory usage. If the VM + wants to relocate a page (maybe off a memory card that is + signalling imminent failure) it will pass a new page and an old + page to this function. migrate_page should transfer any private + data across and update any references that it has to the page. -``putback_page``: Called by the VM when isolated page's migration fails. +``putback_page`` + Called by the VM when isolated page's migration fails. -``launder_page``: Called before freeing a page - it writes back the dirty page. To - prevent redirtying the page, it is kept locked during the whole - operation. +``launder_page`` + Called before freeing a page - it writes back the dirty page. + To prevent redirtying the page, it is kept locked during the + whole operation. -``is_partially_uptodate``: Called by the VM when reading a file through the - pagecache when the underlying blocksize != pagesize. If the required - block is up to date then the read can complete without needing the IO - to bring the whole page up to date. +``is_partially_uptodate`` + Called by the VM when reading a file through the pagecache when + the underlying blocksize != pagesize. If the required block is + up to date then the read can complete without needing the IO to + bring the whole page up to date. -``is_dirty_writeback``: Called by the VM when attempting to reclaim a page. - The VM uses dirty and writeback information to determine if it needs - to stall to allow flushers a chance to complete some IO. Ordinarily - it can use PageDirty and PageWriteback but some filesystems have - more complex state (unstable pages in NFS prevent reclaim) or - do not set those flags due to locking problems. This callback - allows a filesystem to indicate to the VM if a page should be - treated as dirty or writeback for the purposes of stalling. +``is_dirty_writeback`` + Called by the VM when attempting to reclaim a page. The VM uses + dirty and writeback information to determine if it needs to + stall to allow flushers a chance to complete some IO. + Ordinarily it can use PageDirty and PageWriteback but some + filesystems have more complex state (unstable pages in NFS + prevent reclaim) or do not set those flags due to locking + problems. This callback allows a filesystem to indicate to the + VM if a page should be treated as dirty or writeback for the + purposes of stalling. -``error_remove_page``: normally set to generic_error_remove_page if truncation - is ok for this address space. Used for memory failure handling. +``error_remove_page`` + normally set to generic_error_remove_page if truncation is ok + for this address space. Used for memory failure handling. Setting this implies you deal with pages going away under you, unless you have them locked or reference counts increased. -``swap_activate``: Called when swapon is used on a file to allocate - space if necessary and pin the block lookup information in - memory. A return value of zero indicates success, - in which case this file can be used to back swapspace. +``swap_activate`` + Called when swapon is used on a file to allocate space if + necessary and pin the block lookup information in memory. A + return value of zero indicates success, in which case this file + can be used to back swapspace. -``swap_deactivate``: Called during swapoff on files where swap_activate - was successful. +``swap_deactivate`` + Called during swapoff on files where swap_activate was + successful. The File Object @@ -912,91 +993,120 @@ This describes how the VFS can manipulate an open file. As of kernel Again, all methods are called without any locks being held, unless otherwise noted. -``llseek``: called when the VFS needs to move the file position index +``llseek`` + called when the VFS needs to move the file position index -``read``: called by read(2) and related system calls +``read`` + called by read(2) and related system calls -``read_iter``: possibly asynchronous read with iov_iter as destination +``read_iter`` + possibly asynchronous read with iov_iter as destination -``write``: called by write(2) and related system calls +``write`` + called by write(2) and related system calls -``write_iter``: possibly asynchronous write with iov_iter as source +``write_iter`` + possibly asynchronous write with iov_iter as source -``iopoll``: called when aio wants to poll for completions on HIPRI iocbs +``iopoll`` + called when aio wants to poll for completions on HIPRI iocbs -``iterate``: called when the VFS needs to read the directory contents +``iterate`` + called when the VFS needs to read the directory contents -``iterate_shared``: called when the VFS needs to read the directory contents - when filesystem supports concurrent dir iterators +``iterate_shared`` + called when the VFS needs to read the directory contents when + filesystem supports concurrent dir iterators -``poll``: called by the VFS when a process wants to check if there is +``poll`` + called by the VFS when a process wants to check if there is activity on this file and (optionally) go to sleep until there is activity. Called by the select(2) and poll(2) system calls -``unlocked_ioctl``: called by the ioctl(2) system call. +``unlocked_ioctl`` + called by the ioctl(2) system call. -``compat_ioctl``: called by the ioctl(2) system call when 32 bit system calls - are used on 64 bit kernels. +``compat_ioctl`` + called by the ioctl(2) system call when 32 bit system calls are + used on 64 bit kernels. -``mmap``: called by the mmap(2) system call +``mmap`` + called by the mmap(2) system call -``open``: called by the VFS when an inode should be opened. When the VFS +``open`` + called by the VFS when an inode should be opened. When the VFS opens a file, it creates a new "struct file". It then calls the open method for the newly allocated file structure. You might - think that the open method really belongs in - "struct inode_operations", and you may be right. I think it's - done the way it is because it makes filesystems simpler to - implement. The open() method is a good place to initialize the + think that the open method really belongs in "struct + inode_operations", and you may be right. I think it's done the + way it is because it makes filesystems simpler to implement. + The open() method is a good place to initialize the "private_data" member in the file structure if you want to point to a device structure -``flush``: called by the close(2) system call to flush a file +``flush`` + called by the close(2) system call to flush a file -``release``: called when the last reference to an open file is closed +``release`` + called when the last reference to an open file is closed -``fsync``: called by the fsync(2) system call. Also see the section above - entitled "Handling errors during writeback". +``fsync`` + called by the fsync(2) system call. Also see the section above + entitled "Handling errors during writeback". -``fasync``: called by the fcntl(2) system call when asynchronous +``fasync`` + called by the fcntl(2) system call when asynchronous (non-blocking) mode is enabled for a file -``lock``: called by the fcntl(2) system call for F_GETLK, F_SETLK, and F_SETLKW - commands +``lock`` + called by the fcntl(2) system call for F_GETLK, F_SETLK, and + F_SETLKW commands -``get_unmapped_area``: called by the mmap(2) system call +``get_unmapped_area`` + called by the mmap(2) system call -``check_flags``: called by the fcntl(2) system call for F_SETFL command +``check_flags`` + called by the fcntl(2) system call for F_SETFL command -``flock``: called by the flock(2) system call +``flock`` + called by the flock(2) system call -``splice_write``: called by the VFS to splice data from a pipe to a file. This - method is used by the splice(2) system call +``splice_write`` + called by the VFS to splice data from a pipe to a file. This + method is used by the splice(2) system call -``splice_read``: called by the VFS to splice data from file to a pipe. This - method is used by the splice(2) system call +``splice_read`` + called by the VFS to splice data from file to a pipe. This + method is used by the splice(2) system call -``setlease``: called by the VFS to set or release a file lock lease. setlease - implementations should call generic_setlease to record or remove - the lease in the inode after setting it. +``setlease`` + called by the VFS to set or release a file lock lease. setlease + implementations should call generic_setlease to record or remove + the lease in the inode after setting it. -``fallocate``: called by the VFS to preallocate blocks or punch a hole. +``fallocate`` + called by the VFS to preallocate blocks or punch a hole. -``copy_file_range``: called by the copy_file_range(2) system call. +``copy_file_range`` + called by the copy_file_range(2) system call. -``remap_file_range``: called by the ioctl(2) system call for FICLONERANGE and - FICLONE and FIDEDUPERANGE commands to remap file ranges. An - implementation should remap len bytes at pos_in of the source file into - the dest file at pos_out. Implementations must handle callers passing - in len == 0; this means "remap to the end of the source file". The - return value should the number of bytes remapped, or the usual - negative error code if errors occurred before any bytes were remapped. - The remap_flags parameter accepts REMAP_FILE_* flags. If - REMAP_FILE_DEDUP is set then the implementation must only remap if the - requested file ranges have identical contents. If REMAP_CAN_SHORTEN is - set, the caller is ok with the implementation shortening the request - length to satisfy alignment or EOF requirements (or any other reason). +``remap_file_range`` + called by the ioctl(2) system call for FICLONERANGE and FICLONE + and FIDEDUPERANGE commands to remap file ranges. An + implementation should remap len bytes at pos_in of the source + file into the dest file at pos_out. Implementations must handle + callers passing in len == 0; this means "remap to the end of the + source file". The return value should the number of bytes + remapped, or the usual negative error code if errors occurred + before any bytes were remapped. The remap_flags parameter + accepts REMAP_FILE_* flags. If REMAP_FILE_DEDUP is set then the + implementation must only remap if the requested file ranges have + identical contents. If REMAP_CAN_SHORTEN is set, the caller is + ok with the implementation shortening the request length to + satisfy alignment or EOF requirements (or any other reason). -``fadvise``: possibly called by the fadvise64() system call. +``fadvise`` + possibly called by the fadvise64() system call. Note that the file operations are implemented by the specific filesystem in which the inode resides. When opening a device node @@ -1041,89 +1151,104 @@ defined: struct dentry *(*d_real)(struct dentry *, const struct inode *); }; -``d_revalidate``: called when the VFS needs to revalidate a dentry. This - is called whenever a name look-up finds a dentry in the - dcache. Most local filesystems leave this as NULL, because all their - dentries in the dcache are valid. Network filesystems are different - since things can change on the server without the client necessarily - being aware of it. +``d_revalidate`` + called when the VFS needs to revalidate a dentry. This is + called whenever a name look-up finds a dentry in the dcache. + Most local filesystems leave this as NULL, because all their + dentries in the dcache are valid. Network filesystems are + different since things can change on the server without the + client necessarily being aware of it. - This function should return a positive value if the dentry is still - valid, and zero or a negative error code if it isn't. + This function should return a positive value if the dentry is + still valid, and zero or a negative error code if it isn't. - d_revalidate may be called in rcu-walk mode (flags & LOOKUP_RCU). - If in rcu-walk mode, the filesystem must revalidate the dentry without - blocking or storing to the dentry, d_parent and d_inode should not be - used without care (because they can change and, in d_inode case, even - become NULL under us). + d_revalidate may be called in rcu-walk mode (flags & + LOOKUP_RCU). If in rcu-walk mode, the filesystem must + revalidate the dentry without blocking or storing to the dentry, + d_parent and d_inode should not be used without care (because + they can change and, in d_inode case, even become NULL under + us). - If a situation is encountered that rcu-walk cannot handle, return + If a situation is encountered that rcu-walk cannot handle, + return -ECHILD and it will be called again in ref-walk mode. -``_weak_revalidate``: called when the VFS needs to revalidate a "jumped" dentry. - This is called when a path-walk ends at dentry that was not acquired by - doing a lookup in the parent directory. This includes "/", "." and "..", - as well as procfs-style symlinks and mountpoint traversal. +``_weak_revalidate`` + called when the VFS needs to revalidate a "jumped" dentry. This + is called when a path-walk ends at dentry that was not acquired + by doing a lookup in the parent directory. This includes "/", + "." and "..", as well as procfs-style symlinks and mountpoint + traversal. - In this case, we are less concerned with whether the dentry is still - fully correct, but rather that the inode is still valid. As with - d_revalidate, most local filesystems will set this to NULL since their - dcache entries are always valid. + In this case, we are less concerned with whether the dentry is + still fully correct, but rather that the inode is still valid. + As with d_revalidate, most local filesystems will set this to + NULL since their dcache entries are always valid. - This function has the same return code semantics as d_revalidate. + This function has the same return code semantics as + d_revalidate. d_weak_revalidate is only called after leaving rcu-walk mode. -``d_hash``: called when the VFS adds a dentry to the hash table. The first +``d_hash`` + called when the VFS adds a dentry to the hash table. The first dentry passed to d_hash is the parent directory that the name is to be hashed into. Same locking and synchronisation rules as d_compare regarding what is safe to dereference etc. -``d_compare``: called to compare a dentry name with a given name. The first +``d_compare`` + called to compare a dentry name with a given name. The first dentry is the parent of the dentry to be compared, the second is - the child dentry. len and name string are properties of the dentry - to be compared. qstr is the name to compare it with. + the child dentry. len and name string are properties of the + dentry to be compared. qstr is the name to compare it with. Must be constant and idempotent, and should not take locks if - possible, and should not or store into the dentry. - Should not dereference pointers outside the dentry without - lots of care (eg. d_parent, d_inode, d_name should not be used). + possible, and should not or store into the dentry. Should not + dereference pointers outside the dentry without lots of care + (eg. d_parent, d_inode, d_name should not be used). - However, our vfsmount is pinned, and RCU held, so the dentries and - inodes won't disappear, neither will our sb or filesystem module. - ->d_sb may be used. + However, our vfsmount is pinned, and RCU held, so the dentries + and inodes won't disappear, neither will our sb or filesystem + module. ->d_sb may be used. - It is a tricky calling convention because it needs to be called under - "rcu-walk", ie. without any locks or references on things. + It is a tricky calling convention because it needs to be called + under "rcu-walk", ie. without any locks or references on things. -``d_delete``: called when the last reference to a dentry is dropped and the - dcache is deciding whether or not to cache it. Return 1 to delete - immediately, or 0 to cache the dentry. Default is NULL which means to - always cache a reachable dentry. d_delete must be constant and - idempotent. +``d_delete`` + called when the last reference to a dentry is dropped and the + dcache is deciding whether or not to cache it. Return 1 to + delete immediately, or 0 to cache the dentry. Default is NULL + which means to always cache a reachable dentry. d_delete must + be constant and idempotent. -``d_init``: called when a dentry is allocated +``d_init`` + called when a dentry is allocated -``d_release``: called when a dentry is really deallocated +``d_release`` + called when a dentry is really deallocated -``d_iput``: called when a dentry loses its inode (just prior to its - being deallocated). The default when this is NULL is that the - VFS calls iput(). If you define this method, you must call - iput() yourself +``d_iput`` + called when a dentry loses its inode (just prior to its being + deallocated). The default when this is NULL is that the VFS + calls iput(). If you define this method, you must call iput() + yourself -``d_dname``: called when the pathname of a dentry should be generated. - Useful for some pseudo filesystems (sockfs, pipefs, ...) to delay - pathname generation. (Instead of doing it when dentry is created, - it's done only when the path is needed.). Real filesystems probably - dont want to use it, because their dentries are present in global - dcache hash, so their hash should be an invariant. As no lock is - held, d_dname() should not try to modify the dentry itself, unless - appropriate SMP safety is used. CAUTION : d_path() logic is quite - tricky. The correct way to return for example "Hello" is to put it - at the end of the buffer, and returns a pointer to the first char. - dynamic_dname() helper function is provided to take care of this. +``d_dname`` + called when the pathname of a dentry should be generated. + Useful for some pseudo filesystems (sockfs, pipefs, ...) to + delay pathname generation. (Instead of doing it when dentry is + created, it's done only when the path is needed.). Real + filesystems probably dont want to use it, because their dentries + are present in global dcache hash, so their hash should be an + invariant. As no lock is held, d_dname() should not try to + modify the dentry itself, unless appropriate SMP safety is used. + CAUTION : d_path() logic is quite tricky. The correct way to + return for example "Hello" is to put it at the end of the + buffer, and returns a pointer to the first char. + dynamic_dname() helper function is provided to take care of + this. Example : @@ -1135,52 +1260,57 @@ defined: dentry->d_inode->i_ino); } -``d_automount``: called when an automount dentry is to be traversed (optional). - This should create a new VFS mount record and return the record to the - caller. The caller is supplied with a path parameter giving the - automount directory to describe the automount target and the parent - VFS mount record to provide inheritable mount parameters. NULL should - be returned if someone else managed to make the automount first. If - the vfsmount creation failed, then an error code should be returned. - If -EISDIR is returned, then the directory will be treated as an - ordinary directory and returned to pathwalk to continue walking. +``d_automount`` + called when an automount dentry is to be traversed (optional). + This should create a new VFS mount record and return the record + to the caller. The caller is supplied with a path parameter + giving the automount directory to describe the automount target + and the parent VFS mount record to provide inheritable mount + parameters. NULL should be returned if someone else managed to + make the automount first. If the vfsmount creation failed, then + an error code should be returned. If -EISDIR is returned, then + the directory will be treated as an ordinary directory and + returned to pathwalk to continue walking. - If a vfsmount is returned, the caller will attempt to mount it on the - mountpoint and will remove the vfsmount from its expiration list in - the case of failure. The vfsmount should be returned with 2 refs on - it to prevent automatic expiration - the caller will clean up the - additional ref. + If a vfsmount is returned, the caller will attempt to mount it + on the mountpoint and will remove the vfsmount from its + expiration list in the case of failure. The vfsmount should be + returned with 2 refs on it to prevent automatic expiration - the + caller will clean up the additional ref. - This function is only used if DCACHE_NEED_AUTOMOUNT is set on the - dentry. This is set by __d_instantiate() if S_AUTOMOUNT is set on the - inode being added. + This function is only used if DCACHE_NEED_AUTOMOUNT is set on + the dentry. This is set by __d_instantiate() if S_AUTOMOUNT is + set on the inode being added. -``d_manage``: called to allow the filesystem to manage the transition from a - dentry (optional). This allows autofs, for example, to hold up clients - waiting to explore behind a 'mountpoint' while letting the daemon go - past and construct the subtree there. 0 should be returned to let the - calling process continue. -EISDIR can be returned to tell pathwalk to - use this directory as an ordinary directory and to ignore anything - mounted on it and not to check the automount flag. Any other error - code will abort pathwalk completely. +``d_manage`` + called to allow the filesystem to manage the transition from a + dentry (optional). This allows autofs, for example, to hold up + clients waiting to explore behind a 'mountpoint' while letting + the daemon go past and construct the subtree there. 0 should be + returned to let the calling process continue. -EISDIR can be + returned to tell pathwalk to use this directory as an ordinary + directory and to ignore anything mounted on it and not to check + the automount flag. Any other error code will abort pathwalk + completely. If the 'rcu_walk' parameter is true, then the caller is doing a - pathwalk in RCU-walk mode. Sleeping is not permitted in this mode, - and the caller can be asked to leave it and call again by returning - -ECHILD. -EISDIR may also be returned to tell pathwalk to - ignore d_automount or any mounts. + pathwalk in RCU-walk mode. Sleeping is not permitted in this + mode, and the caller can be asked to leave it and call again by + returning -ECHILD. -EISDIR may also be returned to tell + pathwalk to ignore d_automount or any mounts. - This function is only used if DCACHE_MANAGE_TRANSIT is set on the - dentry being transited from. + This function is only used if DCACHE_MANAGE_TRANSIT is set on + the dentry being transited from. -``d_real``: overlay/union type filesystems implement this method to return one of - the underlying dentries hidden by the overlay. It is used in two - different modes: +``d_real`` + overlay/union type filesystems implement this method to return + one of the underlying dentries hidden by the overlay. It is + used in two different modes: - Called from file_dentry() it returns the real dentry matching the inode - argument. The real dentry may be from a lower layer already copied up, - but still referenced from the file. This mode is selected with a - non-NULL inode argument. + Called from file_dentry() it returns the real dentry matching + the inode argument. The real dentry may be from a lower layer + already copied up, but still referenced from the file. This + mode is selected with a non-NULL inode argument. With NULL inode the topmost real underlying dentry is returned. @@ -1195,40 +1325,47 @@ Directory Entry Cache API There are a number of functions defined which permit a filesystem to manipulate dentries: -``dget``: open a new handle for an existing dentry (this just increments +``dget`` + open a new handle for an existing dentry (this just increments the usage count) -``dput``: close a handle for a dentry (decrements the usage count). If +``dput`` + close a handle for a dentry (decrements the usage count). If the usage count drops to 0, and the dentry is still in its parent's hash, the "d_delete" method is called to check whether - it should be cached. If it should not be cached, or if the dentry - is not hashed, it is deleted. Otherwise cached dentries are put - into an LRU list to be reclaimed on memory shortage. + it should be cached. If it should not be cached, or if the + dentry is not hashed, it is deleted. Otherwise cached dentries + are put into an LRU list to be reclaimed on memory shortage. -``d_drop``: this unhashes a dentry from its parents hash list. A - subsequent call to dput() will deallocate the dentry if its - usage count drops to 0 +``d_drop`` + this unhashes a dentry from its parents hash list. A subsequent + call to dput() will deallocate the dentry if its usage count + drops to 0 -``d_delete``: delete a dentry. If there are no other open references to - the dentry then the dentry is turned into a negative dentry - (the d_iput() method is called). If there are other - references, then d_drop() is called instead +``d_delete`` + delete a dentry. If there are no other open references to the + dentry then the dentry is turned into a negative dentry (the + d_iput() method is called). If there are other references, then + d_drop() is called instead -``d_add``: add a dentry to its parents hash list and then calls +``d_add`` + add a dentry to its parents hash list and then calls d_instantiate() -``d_instantiate``: add a dentry to the alias hash list for the inode and - updates the "d_inode" member. The "i_count" member in the - inode structure should be set/incremented. If the inode - pointer is NULL, the dentry is called a "negative - dentry". This function is commonly called when an inode is - created for an existing negative dentry +``d_instantiate`` + add a dentry to the alias hash list for the inode and updates + the "d_inode" member. The "i_count" member in the inode + structure should be set/incremented. If the inode pointer is + NULL, the dentry is called a "negative dentry". This function + is commonly called when an inode is created for an existing + negative dentry -``d_lookup``: look up a dentry given its parent and path name component - It looks up the child of that given name from the dcache - hash table. If it is found, the reference count is incremented - and the dentry is returned. The caller must use dput() - to free the dentry when it finishes using it. +``d_lookup`` + look up a dentry given its parent and path name component It + looks up the child of that given name from the dcache hash + table. If it is found, the reference count is incremented and + the dentry is returned. The caller must use dput() to free the + dentry when it finishes using it. Mount Options From b422124758c19db06c4c30c4abb8f57bf18995b9 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Wed, 5 Jun 2019 19:39:44 +0300 Subject: [PATCH 032/129] docs/core-api: Add string helpers API to the list Some times string helpers are needed, but there is nothing about them in the generated documentation. Fill the gap by adding a reference to string_helpers.c exported functions. Signed-off-by: Andy Shevchenko Acked-by: Mike Rapoport Signed-off-by: Jonathan Corbet --- Documentation/core-api/kernel-api.rst | 3 +++ 1 file changed, 3 insertions(+) diff --git a/Documentation/core-api/kernel-api.rst b/Documentation/core-api/kernel-api.rst index a53ec2eb8176..65ae2bf1f86d 100644 --- a/Documentation/core-api/kernel-api.rst +++ b/Documentation/core-api/kernel-api.rst @@ -33,6 +33,9 @@ String Conversions .. kernel-doc:: lib/kstrtox.c :export: +.. kernel-doc:: lib/string_helpers.c + :export: + String Manipulation ------------------- From 58d494669f36d0b61b7ec42c232877167ed3f5ce Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Wed, 5 Jun 2019 19:51:13 +0300 Subject: [PATCH 033/129] docs/core-api: Add integer power functions to the list Some times integer power functions, such as int_sqrt(), are needed, but there is nothing about them in the generated documentation. Fill the gap by adding a reference to the corresponding exported functions. Signed-off-by: Andy Shevchenko Acked-by: Mike Rapoport Signed-off-by: Jonathan Corbet --- Documentation/core-api/kernel-api.rst | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/Documentation/core-api/kernel-api.rst b/Documentation/core-api/kernel-api.rst index 65ae2bf1f86d..824f24ccf401 100644 --- a/Documentation/core-api/kernel-api.rst +++ b/Documentation/core-api/kernel-api.rst @@ -141,6 +141,15 @@ Base 2 log and power Functions .. kernel-doc:: include/linux/log2.h :internal: +Integer power Functions +----------------------- + +.. kernel-doc:: lib/math/int_pow.c + :export: + +.. kernel-doc:: lib/math/int_sqrt.c + :export: + Division Functions ------------------ From 99d2b938672944831035bef50c68a6e948e93abf Mon Sep 17 00:00:00 2001 From: Yoshihiro Shimoda Date: Fri, 7 Jun 2019 16:47:13 +0900 Subject: [PATCH 034/129] Documentation: DMA-API: fix a function name of max_mapping_size The exported function name is dma_max_mapping_size(), not dma_direct_max_mapping_size() so that this patch fixes the function name in the documentation. Fixes: 133d624b1cee ("dma: Introduce dma_max_mapping_size()") Signed-off-by: Yoshihiro Shimoda Signed-off-by: Jonathan Corbet --- Documentation/DMA-API.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt index 0076150fdccb..e47c63bd4887 100644 --- a/Documentation/DMA-API.txt +++ b/Documentation/DMA-API.txt @@ -198,7 +198,7 @@ call to set the mask to the value returned. :: size_t - dma_direct_max_mapping_size(struct device *dev); + dma_max_mapping_size(struct device *dev); Returns the maximum size of a mapping for the device. The size parameter of the mapping functions like dma_map_single(), dma_map_page() and From 4241d516b0041ae55092fb12739e12184427de5d Mon Sep 17 00:00:00 2001 From: Helen Koike Date: Tue, 4 Jun 2019 15:27:19 -0300 Subject: [PATCH 035/129] Documentation/dm-init: fix multi device example The example in the docs regarding multiple device-mappers is invalid (it has a wrong number of arguments), it's a left over from previous versions of the patch. Replace the example with an valid and tested one. Signed-off-by: Helen Koike Reviewed-by: Stephen Boyd Signed-off-by: Jonathan Corbet --- Documentation/device-mapper/dm-init.txt | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/Documentation/device-mapper/dm-init.txt b/Documentation/device-mapper/dm-init.txt index 8464ee7c01b8..130b3c3679c5 100644 --- a/Documentation/device-mapper/dm-init.txt +++ b/Documentation/device-mapper/dm-init.txt @@ -74,13 +74,13 @@ this target to /dev/mapper/lroot (depending on the rules). No uuid was assigned. An example of multiple device-mappers, with the dm-mod.create="..." contents is shown here split on multiple lines for readability: - vroot,,,ro, - 0 1740800 verity 254:0 254:0 1740800 sha1 - 76e9be054b15884a9fa85973e9cb274c93afadb6 - 5b3549d54d6c7a3837b9b81ed72e49463a64c03680c47835bef94d768e5646fe; - vram,,,rw, - 0 32768 linear 1:0 0, - 32768 32768 linear 1:1 0 + dm-linear,,1,rw, + 0 32768 linear 8:1 0, + 32768 1024000 linear 8:2 0; + dm-verity,,3,ro, + 0 1638400 verity 1 /dev/sdc1 /dev/sdc2 4096 4096 204800 1 sha256 + ac87db56303c9c1da433d7209b5a6ef3e4779df141200cbd7c157dcb8dd89c42 + 5ebfe87f7df3235b80a117ebc4078e44f55045487ad4a96581d1adb564615b51 Other examples (per target): From e0cef9ff6315d48a4dfd39da09ca770e242f9cb5 Mon Sep 17 00:00:00 2001 From: Aurelien Thierry Date: Fri, 7 Jun 2019 10:07:02 +0200 Subject: [PATCH 036/129] Documentation: fix typo CLOCK_MONONOTNIC_COARSE Fix typo in documentation file timekeeping.rst: CLOCK_MONONOTNIC_COARSE should be CLOCK_MONOTONIC_COARSE. Signed-off-by: Aurelien Thierry Signed-off-by: Jonathan Corbet --- Documentation/core-api/timekeeping.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/core-api/timekeeping.rst b/Documentation/core-api/timekeeping.rst index 93cbeb9daec0..5f87d9c8b04d 100644 --- a/Documentation/core-api/timekeeping.rst +++ b/Documentation/core-api/timekeeping.rst @@ -111,7 +111,7 @@ Some additional variants exist for more specialized cases: void ktime_get_coarse_raw_ts64( struct timespec64 * ) These are quicker than the non-coarse versions, but less accurate, - corresponding to CLOCK_MONONOTNIC_COARSE and CLOCK_REALTIME_COARSE + corresponding to CLOCK_MONOTONIC_COARSE and CLOCK_REALTIME_COARSE in user space, along with the equivalent boottime/tai/raw timebase not available in user space. From e47cf0c958775700c74223a1f21a8b3457c57069 Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Fri, 7 Jun 2019 13:07:29 +0200 Subject: [PATCH 037/129] Documentation: tee: Grammar s/the its/its/ Signed-off-by: Geert Uytterhoeven Signed-off-by: Jonathan Corbet --- Documentation/tee.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/tee.txt b/Documentation/tee.txt index 56ea85ffebf2..afacdf2fd1de 100644 --- a/Documentation/tee.txt +++ b/Documentation/tee.txt @@ -32,7 +32,7 @@ User space (the client) connects to the driver by opening /dev/tee[0-9]* or memory. - TEE_IOC_VERSION lets user space know which TEE this driver handles and - the its capabilities. + its capabilities. - TEE_IOC_OPEN_SESSION opens a new session to a Trusted Application. From 6fb44c439eda692f94cf60aad55f130a34204ece Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Fri, 7 Jun 2019 13:08:42 +0200 Subject: [PATCH 038/129] Documentation: net: dsa: Grammar s/the its/its/ Signed-off-by: Geert Uytterhoeven Reviewed-by: Andrew Lunn Signed-off-by: Jonathan Corbet --- Documentation/networking/dsa/dsa.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Documentation/networking/dsa/dsa.rst b/Documentation/networking/dsa/dsa.rst index ca87068b9ab9..563d56c6a25c 100644 --- a/Documentation/networking/dsa/dsa.rst +++ b/Documentation/networking/dsa/dsa.rst @@ -531,7 +531,7 @@ Bridge VLAN filtering a software implementation. .. note:: VLAN ID 0 corresponds to the port private database, which, in the context - of DSA, would be the its port-based VLAN, used by the associated bridge device. + of DSA, would be its port-based VLAN, used by the associated bridge device. - ``port_fdb_del``: bridge layer function invoked when the bridge wants to remove a Forwarding Database entry, the switch hardware should be programmed to delete @@ -554,7 +554,7 @@ Bridge VLAN filtering associated with this VLAN ID. .. note:: VLAN ID 0 corresponds to the port private database, which, in the context - of DSA, would be the its port-based VLAN, used by the associated bridge device. + of DSA, would be its port-based VLAN, used by the associated bridge device. - ``port_mdb_del``: bridge layer function invoked when the bridge wants to remove a multicast database entry, the switch hardware should be programmed to delete From 3f9564e680efb2092dfb826e2f768920c9eb203b Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Fri, 7 Jun 2019 13:29:51 +0200 Subject: [PATCH 039/129] KVM: arm/arm64: Always capitalize ITS All but one reference is capitalized. Fix the remaining one. Signed-off-by: Geert Uytterhoeven Signed-off-by: Jonathan Corbet --- Documentation/virtual/kvm/devices/arm-vgic-its.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/virtual/kvm/devices/arm-vgic-its.txt b/Documentation/virtual/kvm/devices/arm-vgic-its.txt index 4f0c9fc40365..eeaa95b893a8 100644 --- a/Documentation/virtual/kvm/devices/arm-vgic-its.txt +++ b/Documentation/virtual/kvm/devices/arm-vgic-its.txt @@ -103,7 +103,7 @@ Groups: The following ordering must be followed when restoring the GIC and the ITS: a) restore all guest memory and create vcpus b) restore all redistributors -c) provide the its base address +c) provide the ITS base address (KVM_DEV_ARM_VGIC_GRP_ADDR) d) restore the ITS in the following order: 1. Restore GITS_CBASER From b1663d7e3a7961fc45262fd68a89253f2803036c Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Tue, 4 Jun 2019 09:26:27 -0300 Subject: [PATCH 040/129] docs: Kbuild/Makefile: allow check for missing docs at build time While this doesn't make sense for production Kernels, in order to avoid regressions when documents are touched, let's add a check target at the make file. Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Jonathan Corbet --- Documentation/Kconfig | 13 +++++++++++++ Documentation/Makefile | 5 +++++ Kconfig | 2 ++ scripts/documentation-file-ref-check | 9 +++++++++ 4 files changed, 29 insertions(+) create mode 100644 Documentation/Kconfig diff --git a/Documentation/Kconfig b/Documentation/Kconfig new file mode 100644 index 000000000000..66046fa1c341 --- /dev/null +++ b/Documentation/Kconfig @@ -0,0 +1,13 @@ +config WARN_MISSING_DOCUMENTS + + bool "Warn if there's a missing documentation file" + depends on COMPILE_TEST + help + It is not uncommon that a document gets renamed. + This option makes the Kernel to check for missing dependencies, + warning when something is missing. Works only if the Kernel + is built from a git tree. + + If unsure, select 'N'. + + diff --git a/Documentation/Makefile b/Documentation/Makefile index 2df0789f90b7..e145e4db508b 100644 --- a/Documentation/Makefile +++ b/Documentation/Makefile @@ -4,6 +4,11 @@ subdir-y := devicetree/bindings/ +# Check for broken documentation file references +ifeq ($(CONFIG_WARN_MISSING_DOCUMENTS),y) +$(shell $(srctree)/scripts/documentation-file-ref-check --warn) +endif + # You can set these variables from the command line. SPHINXBUILD = sphinx-build SPHINXOPTS = diff --git a/Kconfig b/Kconfig index 48a80beab685..990b0c390dfc 100644 --- a/Kconfig +++ b/Kconfig @@ -30,3 +30,5 @@ source "crypto/Kconfig" source "lib/Kconfig" source "lib/Kconfig.debug" + +source "Documentation/Kconfig" diff --git a/scripts/documentation-file-ref-check b/scripts/documentation-file-ref-check index ff16db269079..440227bb55a9 100755 --- a/scripts/documentation-file-ref-check +++ b/scripts/documentation-file-ref-check @@ -22,9 +22,16 @@ $scriptname =~ s,.*/([^/]+/),$1,; # Parse arguments my $help = 0; my $fix = 0; +my $warn = 0; + +if (! -d ".git") { + printf "Warning: can't check if file exists, as this is not a git tree"; + exit 0; +} GetOptions( 'fix' => \$fix, + 'warn' => \$warn, 'h|help|usage' => \$help, ); @@ -139,6 +146,8 @@ while () { if (!($ref =~ m/(scripts|Kconfig|Kbuild)/)) { $broken_ref{$ref}++; } + } elsif ($warn) { + print STDERR "Warning: $f references a file that doesn't exist: $fulref\n"; } else { print STDERR "$f: $fulref\n"; } From 889aa9ca930602a0e860cfb89e467c2a7a729b1b Mon Sep 17 00:00:00 2001 From: Luca Ceresoli Date: Fri, 31 May 2019 16:30:16 +0200 Subject: [PATCH 041/129] docs: clk: fix struct syntax The clk_foo_ops struct example has syntax errors. Fix it so it can be copy-pasted and used more easily. Signed-off-by: Luca Ceresoli Signed-off-by: Jonathan Corbet --- Documentation/driver-api/clk.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/Documentation/driver-api/clk.rst b/Documentation/driver-api/clk.rst index 593cca5058b1..3cad45d14187 100644 --- a/Documentation/driver-api/clk.rst +++ b/Documentation/driver-api/clk.rst @@ -175,9 +175,9 @@ the following:: To take advantage of your data you'll need to support valid operations for your clk:: - struct clk_ops clk_foo_ops { - .enable = &clk_foo_enable; - .disable = &clk_foo_disable; + struct clk_ops clk_foo_ops = { + .enable = &clk_foo_enable, + .disable = &clk_foo_disable, }; Implement the above functions using container_of:: From 54002b56b04bc83f8961c8751f6bfef07461d587 Mon Sep 17 00:00:00 2001 From: Bjorn Helgaas Date: Thu, 30 May 2019 16:59:14 -0500 Subject: [PATCH 042/129] scripts/sphinx-pre-install: fix "dependenties" typo Fix typo ("dependenties" for "dependencies"). Signed-off-by: Bjorn Helgaas Signed-off-by: Jonathan Corbet --- scripts/sphinx-pre-install | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/sphinx-pre-install b/scripts/sphinx-pre-install index f001fc2fcf12..158f522f12ed 100755 --- a/scripts/sphinx-pre-install +++ b/scripts/sphinx-pre-install @@ -632,7 +632,7 @@ sub check_needs() } printf "\n"; - print "All optional dependenties are met.\n" if (!$optional); + print "All optional dependencies are met.\n" if (!$optional); if ($need == 1) { die "Can't build as $need mandatory dependency is missing"; From 165915c17d681c61962251728d72ecdabe95518e Mon Sep 17 00:00:00 2001 From: Federico Vaga Date: Thu, 30 May 2019 22:14:54 +0200 Subject: [PATCH 043/129] doc:it_IT: fix file references Fix italian translation file references based on `scripts/documentation-file-ref-check` output. Signed-off-by: Federico Vaga Reviewed-by: Mauro Carvalho Chehab Signed-off-by: Jonathan Corbet --- .../it_IT/admin-guide/kernel-parameters.rst | 12 ++++++++++++ .../translations/it_IT/process/adding-syscalls.rst | 2 +- .../translations/it_IT/process/coding-style.rst | 2 +- Documentation/translations/it_IT/process/howto.rst | 2 +- .../translations/it_IT/process/magic-number.rst | 2 +- .../it_IT/process/stable-kernel-rules.rst | 4 ++-- 6 files changed, 18 insertions(+), 6 deletions(-) create mode 100644 Documentation/translations/it_IT/admin-guide/kernel-parameters.rst diff --git a/Documentation/translations/it_IT/admin-guide/kernel-parameters.rst b/Documentation/translations/it_IT/admin-guide/kernel-parameters.rst new file mode 100644 index 000000000000..0e36d82a92be --- /dev/null +++ b/Documentation/translations/it_IT/admin-guide/kernel-parameters.rst @@ -0,0 +1,12 @@ +.. include:: ../disclaimer-ita.rst + +:Original: :ref:`Documentation/admin-guide/kernel-parameters.rst ` + +.. _it_kernelparameters: + +I parametri da linea di comando del kernel +========================================== + +.. warning:: + + TODO ancora da tradurre diff --git a/Documentation/translations/it_IT/process/adding-syscalls.rst b/Documentation/translations/it_IT/process/adding-syscalls.rst index e0a64b0688a7..c3a3439595a6 100644 --- a/Documentation/translations/it_IT/process/adding-syscalls.rst +++ b/Documentation/translations/it_IT/process/adding-syscalls.rst @@ -39,7 +39,7 @@ vostra interfaccia. un qualche modo opaca. - Se dovete esporre solo delle informazioni sul sistema, un nuovo nodo in - sysfs (vedere ``Documentation/translations/it_IT/filesystems/sysfs.txt``) o + sysfs (vedere ``Documentation/filesystems/sysfs.txt``) o in procfs potrebbe essere sufficiente. Tuttavia, l'accesso a questi meccanismi richiede che il filesystem sia montato, il che potrebbe non essere sempre vero (per esempio, in ambienti come namespace/sandbox/chroot). diff --git a/Documentation/translations/it_IT/process/coding-style.rst b/Documentation/translations/it_IT/process/coding-style.rst index 5ef534c95e69..a6559d25a23d 100644 --- a/Documentation/translations/it_IT/process/coding-style.rst +++ b/Documentation/translations/it_IT/process/coding-style.rst @@ -696,7 +696,7 @@ nella stringa di titolo:: ... Per la documentazione completa sui file di configurazione, consultate -il documento Documentation/translations/it_IT/kbuild/kconfig-language.txt +il documento Documentation/kbuild/kconfig-language.txt 11) Strutture dati diff --git a/Documentation/translations/it_IT/process/howto.rst b/Documentation/translations/it_IT/process/howto.rst index 9903ac7c566b..44e6077730e8 100644 --- a/Documentation/translations/it_IT/process/howto.rst +++ b/Documentation/translations/it_IT/process/howto.rst @@ -131,7 +131,7 @@ Di seguito una lista di file che sono presenti nei sorgente del kernel e che "Linux kernel patch submission format" http://linux.yyz.us/patch-format.html - :ref:`Documentation/process/translations/it_IT/stable-api-nonsense.rst ` + :ref:`Documentation/translations/it_IT/process/stable-api-nonsense.rst ` Questo file descrive la motivazioni sottostanti la conscia decisione di non avere un API stabile all'interno del kernel, incluso cose come: diff --git a/Documentation/translations/it_IT/process/magic-number.rst b/Documentation/translations/it_IT/process/magic-number.rst index 5281d53e57ee..ed1121d0ba84 100644 --- a/Documentation/translations/it_IT/process/magic-number.rst +++ b/Documentation/translations/it_IT/process/magic-number.rst @@ -1,6 +1,6 @@ .. include:: ../disclaimer-ita.rst -:Original: :ref:`Documentation/process/magic-numbers.rst ` +:Original: :ref:`Documentation/process/magic-number.rst ` :Translator: Federico Vaga .. _it_magicnumbers: diff --git a/Documentation/translations/it_IT/process/stable-kernel-rules.rst b/Documentation/translations/it_IT/process/stable-kernel-rules.rst index 48e88e5ad2c5..4f206cee31a7 100644 --- a/Documentation/translations/it_IT/process/stable-kernel-rules.rst +++ b/Documentation/translations/it_IT/process/stable-kernel-rules.rst @@ -33,7 +33,7 @@ Regole sul tipo di patch che vengono o non vengono accettate nei sorgenti - Non deve includere alcuna correzione "banale" (correzioni grammaticali, pulizia dagli spazi bianchi, eccetera). - Deve rispettare le regole scritte in - :ref:`Documentation/translation/it_IT/process/submitting-patches.rst ` + :ref:`Documentation/translations/it_IT/process/submitting-patches.rst ` - Questa patch o una equivalente deve esistere già nei sorgenti principali di Linux @@ -43,7 +43,7 @@ Procedura per sottomettere patch per i sorgenti -stable - Se la patch contiene modifiche a dei file nelle cartelle net/ o drivers/net, allora seguite le linee guida descritte in - :ref:`Documentation/translation/it_IT/networking/netdev-FAQ.rst `; + :ref:`Documentation/translations/it_IT/networking/netdev-FAQ.rst `; ma solo dopo aver verificato al seguente indirizzo che la patch non sia già in coda: https://patchwork.ozlabs.org/bundle/davem/stable/?series=&submitter=&state=*&q=&archive= From bed0918d64ca28169d55bd138ed20f09e288303e Mon Sep 17 00:00:00 2001 From: Federico Vaga Date: Thu, 30 May 2019 22:14:55 +0200 Subject: [PATCH 044/129] doc:it_IT: documentation alignment Documentation alignment for the following changes: a700767a7682 (doc/docs-next) docs: requirements.txt: recommend Sphinx 1.7.9 Signed-off-by: Federico Vaga Signed-off-by: Jonathan Corbet --- .../translations/it_IT/doc-guide/sphinx.rst | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/Documentation/translations/it_IT/doc-guide/sphinx.rst b/Documentation/translations/it_IT/doc-guide/sphinx.rst index 793b5cc33403..1739cba8863e 100644 --- a/Documentation/translations/it_IT/doc-guide/sphinx.rst +++ b/Documentation/translations/it_IT/doc-guide/sphinx.rst @@ -35,8 +35,7 @@ Installazione Sphinx ==================== I marcatori ReST utilizzati nei file in Documentation/ sono pensati per essere -processati da ``Sphinx`` nella versione 1.3 o superiore. Se desiderate produrre -un documento PDF è raccomandato l'utilizzo di una versione superiore alle 1.4.6. +processati da ``Sphinx`` nella versione 1.3 o superiore. Esiste uno script che verifica i requisiti Sphinx. Per ulteriori dettagli consultate :ref:`it_sphinx-pre-install`. @@ -68,13 +67,13 @@ pacchettizzato dalla vostra distribuzione. utilizzando LaTeX. Per una corretta interpretazione, è necessario aver installato texlive con i pacchetti amdfonts e amsmath. -Riassumendo, se volete installare la versione 1.4.9 di Sphinx dovete eseguire:: +Riassumendo, se volete installare la versione 1.7.9 di Sphinx dovete eseguire:: - $ virtualenv sphinx_1.4 - $ . sphinx_1.4/bin/activate - (sphinx_1.4) $ pip install -r Documentation/sphinx/requirements.txt + $ virtualenv sphinx_1.7.9 + $ . sphinx_1.7.9/bin/activate + (sphinx_1.7.9) $ pip install -r Documentation/sphinx/requirements.txt -Dopo aver eseguito ``. sphinx_1.4/bin/activate``, il prompt cambierà per +Dopo aver eseguito ``. sphinx_1.7.9/bin/activate``, il prompt cambierà per indicare che state usando il nuovo ambiente. Se aprite un nuova sessione, prima di generare la documentazione, dovrete rieseguire questo comando per rientrare nell'ambiente virtuale. @@ -120,8 +119,8 @@ l'installazione:: You should run: sudo dnf install -y texlive-luatex85 - /usr/bin/virtualenv sphinx_1.4 - . sphinx_1.4/bin/activate + /usr/bin/virtualenv sphinx_1.7.9 + . sphinx_1.7.9/bin/activate pip install -r Documentation/sphinx/requirements.txt Can't build as 1 mandatory dependency is missing at ./scripts/sphinx-pre-install line 468. From 3d9cf48b2ca257f1a249b347236098c3cf9d54f1 Mon Sep 17 00:00:00 2001 From: Shiyang Ruan Date: Thu, 9 May 2019 15:40:49 +0800 Subject: [PATCH 045/129] Documentation: nvdimm: Fix typo Remove the extra 'we '. Signed-off-by: Shiyang Ruan Signed-off-by: Jonathan Corbet --- Documentation/nvdimm/nvdimm.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Documentation/nvdimm/nvdimm.txt b/Documentation/nvdimm/nvdimm.txt index e894de69915a..1669f626b037 100644 --- a/Documentation/nvdimm/nvdimm.txt +++ b/Documentation/nvdimm/nvdimm.txt @@ -284,8 +284,8 @@ A bus has a 1:1 relationship with an NFIT. The current expectation for ACPI based systems is that there is only ever one platform-global NFIT. That said, it is trivial to register multiple NFITs, the specification does not preclude it. The infrastructure supports multiple busses and -we we use this capability to test multiple NFIT configurations in the -unit test. +we use this capability to test multiple NFIT configurations in the unit +test. LIBNVDIMM: control class device in /sys/class From 9d61944356590c40b13f6b1f99df84260e4db0c1 Mon Sep 17 00:00:00 2001 From: Shiyang Ruan Date: Thu, 9 May 2019 11:05:49 +0800 Subject: [PATCH 046/129] Documentation: xfs: Fix typo MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit In "Y+P" of this line, there are two non-ASCII characters(0xd9 0x8d) following behind the 'Y'. Shown as a small '=' under the '+' in VIM and a '賺' in webpage[1]. I think it's a mistake and remove these strange characters. [1]: https://www.kernel.org/doc/Documentation/filesystems/xfs-delayed-logging-design.txt Signed-off-by: Shiyang Ruan Signed-off-by: Jonathan Corbet --- Documentation/filesystems/xfs-delayed-logging-design.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/filesystems/xfs-delayed-logging-design.txt b/Documentation/filesystems/xfs-delayed-logging-design.txt index 2ce36439c09f..9a6dd289b17b 100644 --- a/Documentation/filesystems/xfs-delayed-logging-design.txt +++ b/Documentation/filesystems/xfs-delayed-logging-design.txt @@ -34,7 +34,7 @@ transaction: D A+B+C+D X+n+m+o E E Y (> X+n+m+o) - F E+F Yٍ+p + F E+F Y+p In other words, each time an object is relogged, the new transaction contains the aggregation of all the previous changes currently held only in the log. From 462e5a521ab73f7762583add73cbab1662612beb Mon Sep 17 00:00:00 2001 From: "George G. Davis" Date: Wed, 5 Jun 2019 16:30:10 -0400 Subject: [PATCH 047/129] treewide: trivial: fix s/poped/popped/ typo Fix a couple of s/poped/popped/ typos. Signed-off-by: George G. Davis Acked-by: Steven Rostedt (VMware) Acked-by: Masami Hiramatsu Signed-off-by: Jonathan Corbet --- Documentation/arm/mem_alignment | 2 +- arch/x86/kernel/kprobes/core.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/Documentation/arm/mem_alignment b/Documentation/arm/mem_alignment index 6335fcacbba9..e110e2781039 100644 --- a/Documentation/arm/mem_alignment +++ b/Documentation/arm/mem_alignment @@ -1,4 +1,4 @@ -Too many problems poped up because of unnoticed misaligned memory access in +Too many problems popped up because of unnoticed misaligned memory access in kernel code lately. Therefore the alignment fixup is now unconditionally configured in for SA11x0 based targets. According to Alan Cox, this is a bad idea to configure it out, but Russell King has some good reasons for diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c index 9e4fa2484d10..1de809afaf65 100644 --- a/arch/x86/kernel/kprobes/core.c +++ b/arch/x86/kernel/kprobes/core.c @@ -826,7 +826,7 @@ __used __visible void *trampoline_handler(struct pt_regs *regs) continue; /* * Return probes must be pushed on this hash list correct - * order (same as return order) so that it can be poped + * order (same as return order) so that it can be popped * correctly. However, if we find it is pushed it incorrect * order, this means we find a function which should not be * probed, because the wrong order entry is pushed on the From 78a89463a31ce463a4b968553f57ff9932a0697f Mon Sep 17 00:00:00 2001 From: Lecopzer Chen Date: Thu, 9 May 2019 18:31:16 +0800 Subject: [PATCH 048/129] Documentation: {u,k}probes: add tracing_on before tracing After following the document step by step, the `cat trace` can't be worked without enabling tracing_on and might mislead newbies about the functionality. Signed-off-by: Lecopzer Chen Acked-by: Masami Hiramatsu Signed-off-by: Jonathan Corbet --- Documentation/trace/kprobetrace.rst | 6 ++++++ Documentation/trace/uprobetracer.rst | 7 ++++++- 2 files changed, 12 insertions(+), 1 deletion(-) diff --git a/Documentation/trace/kprobetrace.rst b/Documentation/trace/kprobetrace.rst index 235ce2ab131a..baa3c42ba2f4 100644 --- a/Documentation/trace/kprobetrace.rst +++ b/Documentation/trace/kprobetrace.rst @@ -189,6 +189,12 @@ events, you need to enable it. echo 1 > /sys/kernel/debug/tracing/events/kprobes/myprobe/enable echo 1 > /sys/kernel/debug/tracing/events/kprobes/myretprobe/enable +Use the following command to start tracing in an interval. +:: + # echo 1 > tracing_on + Open something... + # echo 0 > tracing_on + And you can see the traced information via /sys/kernel/debug/tracing/trace. :: diff --git a/Documentation/trace/uprobetracer.rst b/Documentation/trace/uprobetracer.rst index 4346e23e3ae7..0b21305fabdc 100644 --- a/Documentation/trace/uprobetracer.rst +++ b/Documentation/trace/uprobetracer.rst @@ -152,10 +152,15 @@ events, you need to enable it by:: # echo 1 > events/uprobes/enable -Lets disable the event after sleeping for some time. +Lets start tracing, sleep for some time and stop tracing. :: + # echo 1 > tracing_on # sleep 20 + # echo 0 > tracing_on + +Also, you can disable the event by:: + # echo 0 > events/uprobes/enable And you can see the traced information via /sys/kernel/debug/tracing/trace. From 671c30957e78a822917cf0b04c4592e9813f7f9b Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Fri, 7 Jun 2019 15:54:17 -0300 Subject: [PATCH 049/129] ABI: sysfs-devices-system-cpu: point to the right docs The cpuidle doc was split on two, one at the admin guide and another one at the driver API guide. Instead of pointing to a non-existent file, point to both (admin guide being the first one). Signed-off-by: Mauro Carvalho Chehab Acked-by: Rafael J. Wysocki Signed-off-by: Jonathan Corbet --- Documentation/ABI/testing/sysfs-devices-system-cpu | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu index 1528239f69b2..87478ac6c2af 100644 --- a/Documentation/ABI/testing/sysfs-devices-system-cpu +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu @@ -137,7 +137,8 @@ Description: Discover cpuidle policy and mechanism current_governor: (RW) displays current idle policy. Users can switch the governor at runtime by writing to this file. - See files in Documentation/cpuidle/ for more information. + See Documentation/admin-guide/pm/cpuidle.rst and + Documentation/driver-api/pm/cpuidle.rst for more information. What: /sys/devices/system/cpu/cpuX/cpuidle/stateN/name From 8b01caee99fb07218908c0ac9be8c758878f33f9 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Fri, 7 Jun 2019 15:54:18 -0300 Subject: [PATCH 050/129] isdn: mISDN: remove a bogus reference to a non-existing doc The mISDN driver was added on those commits: 960366cf8dbb ("Add mISDN DSP") 1b2b03f8e514 ("Add mISDN core files") 04578dd330f1 ("Define AF_ISDN and PF_ISDN") e4ac9bc1f668 ("Add mISDN driver") None of them added a Documentation/isdn/mISDN.cert file. Also, whatever were supposed to be written there on that time, probably doesn't make any sense nowadays, as I doubt isdn would have any massive changes. So, let's just get rid of the broken reference, in order to shut up a warning produced by ./scripts/documentation-file-ref-check. Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Jonathan Corbet --- drivers/isdn/mISDN/dsp_core.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/isdn/mISDN/dsp_core.c b/drivers/isdn/mISDN/dsp_core.c index cd036e87335a..038e72a84b33 100644 --- a/drivers/isdn/mISDN/dsp_core.c +++ b/drivers/isdn/mISDN/dsp_core.c @@ -4,8 +4,6 @@ * Karsten Keil (keil@isdn4linux.de) * * This file is (c) under GNU PUBLIC LICENSE - * For changes and modifications please read - * ../../../Documentation/isdn/mISDN.cert * * Thanks to Karsten Keil (great drivers) * Cologne Chip (great chips) From 065efe27872ca942b53b9f11d5b3f534a9c33857 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Fri, 7 Jun 2019 15:54:19 -0300 Subject: [PATCH 051/129] docs: zh_CN: get rid of basic_profiling.txt Changeset 5700d1974818 ("docs: Get rid of the "basic profiling" guide") removed an old basic-profiling.txt file that was not updated over the last 11 years and won't reflect the post-perf era. It makes no sense to keep its translation, so get rid of it too. Fixes: 5700d1974818 ("docs: Get rid of the "basic profiling" guide") Signed-off-by: Mauro Carvalho Chehab Acked-by: Alex Shi Signed-off-by: Jonathan Corbet --- .../translations/zh_CN/basic_profiling.txt | 71 ------------------- 1 file changed, 71 deletions(-) delete mode 100644 Documentation/translations/zh_CN/basic_profiling.txt diff --git a/Documentation/translations/zh_CN/basic_profiling.txt b/Documentation/translations/zh_CN/basic_profiling.txt deleted file mode 100644 index 1e6bf0bdf8f5..000000000000 --- a/Documentation/translations/zh_CN/basic_profiling.txt +++ /dev/null @@ -1,71 +0,0 @@ -Chinese translated version of Documentation/basic_profiling - -If you have any comment or update to the content, please post to LKML directly. -However, if you have problem communicating in English you can also ask the -Chinese maintainer for help. Contact the Chinese maintainer, if this -translation is outdated or there is problem with translation. - -Chinese maintainer: Liang Xie ---------------------------------------------------------------------- -Documentation/basic_profiling的中文翻译 - -如果想评论或更新本文的内容,请直接发信到LKML。如果你使用英文交流有困难的话,也可 -以向中文版维护者求助。如果本翻译更新不及时或者翻译存在问题,请联系中文版维护者。 - -中文版维护者: 谢良 Liang Xie -中文版翻译者: 谢良 Liang Xie -中文版校译者: -以下为正文 ---------------------------------------------------------------------- - -下面这些说明指令都是非常基础的,如果你想进一步了解请阅读相关专业文档:) -请不要再在本文档增加新的内容,但可以修复文档中的错误:)(mbligh@aracnet.com) -感谢John Levon,Dave Hansen等在撰写时的帮助 - - 用于表示要测量的目标 -请先确保您已经有正确的System.map / vmlinux配置! - -对于linux系统来说,配置vmlinuz最容易的方法可能就是使用“make install”,然后修改 -/sbin/installkernel将vmlinux拷贝到/boot目录,而System.map通常是默认安装好的 - -Readprofile ------------ -2.6系列内核需要版本相对较新的readprofile,比如util-linux 2.12a中包含的,可以从: - -http://www.kernel.org/pub/linux/utils/util-linux/ 下载 - -大部分linux发行版已经包含了. - -启用readprofile需要在kernel启动命令行增加”profile=2“ - -clear readprofile -r - -dump output readprofile -m /boot/System.map > captured_profile - -Oprofile --------- - -从http://oprofile.sourceforge.net/获取源代码(请参考Changes以获取匹配的版本) -在kernel启动命令行增加“idle=poll” - -配置CONFIG_PROFILING=y和CONFIG_OPROFILE=y然后重启进入新kernel - -./configure --with-kernel-support -make install - -想得到好的测量结果,请确保启用了本地APIC特性。如果opreport显示有0Hz CPU, -说明APIC特性没有开启。另外注意idle=poll选项可能有损性能。 - -One time setup: - opcontrol --setup --vmlinux=/boot/vmlinux - -clear opcontrol --reset -start opcontrol --start - -stop opcontrol --stop -dump output opreport > output_file - -如果只看kernel相关的报告结果,请运行命令 opreport -l /boot/vmlinux > output_file - -通过reset选项可以清理过期统计数据,相当于重启的效果。 - From 2e03e3a42c961b709926ba5f7c42c09ea6bfb8c1 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Fri, 7 Jun 2019 15:54:20 -0300 Subject: [PATCH 052/129] docs: mm: numaperf.rst: get rid of a build warning When building it, it gets this warning: Documentation/admin-guide/mm/numaperf.rst:168: WARNING: Footnote [1] is not referenced. The problem is that this is not really a reference, as it is not mentioned within the documentation. Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Jonathan Corbet --- Documentation/admin-guide/mm/numaperf.rst | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/Documentation/admin-guide/mm/numaperf.rst b/Documentation/admin-guide/mm/numaperf.rst index c067ed145158..a80c3c37226e 100644 --- a/Documentation/admin-guide/mm/numaperf.rst +++ b/Documentation/admin-guide/mm/numaperf.rst @@ -165,5 +165,6 @@ write-through caching. ======== See Also ======== -.. [1] https://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf - Section 5.2.27 + +[1] https://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf +- Section 5.2.27 From d857a3ffd3d609d1c822b255d4fe4db8b3464e34 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Fri, 7 Jun 2019 15:54:21 -0300 Subject: [PATCH 053/129] docs: bpf: get rid of two warnings Documentation/bpf/btf.rst:154: WARNING: Unexpected indentation. Documentation/bpf/btf.rst:163: WARNING: Unexpected indentation. Signed-off-by: Mauro Carvalho Chehab Acked-by: Song Liu Signed-off-by: Jonathan Corbet --- Documentation/bpf/btf.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/Documentation/bpf/btf.rst b/Documentation/bpf/btf.rst index 8820360d00da..4ae022d274ab 100644 --- a/Documentation/bpf/btf.rst +++ b/Documentation/bpf/btf.rst @@ -151,6 +151,7 @@ for the type. The maximum value of ``BTF_INT_BITS()`` is 128. The ``BTF_INT_OFFSET()`` specifies the starting bit offset to calculate values for this int. For example, a bitfield struct member has: + * btf member bit offset 100 from the start of the structure, * btf member pointing to an int type, * the int type has ``BTF_INT_OFFSET() = 2`` and ``BTF_INT_BITS() = 4`` @@ -160,6 +161,7 @@ from bits ``100 + 2 = 102``. Alternatively, the bitfield struct member can be the following to access the same bits as the above: + * btf member bit offset 102, * btf member pointing to an int type, * the int type has ``BTF_INT_OFFSET() = 0`` and ``BTF_INT_BITS() = 4`` From 27c054d2939f1a46a4da62732e71c140e664afb9 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Fri, 7 Jun 2019 15:54:22 -0300 Subject: [PATCH 054/129] docs: mark orphan documents as such Sphinx doesn't like orphan documents: Documentation/accelerators/ocxl.rst: WARNING: document isn't included in any toctree Documentation/arm/stm32/overview.rst: WARNING: document isn't included in any toctree Documentation/arm/stm32/stm32f429-overview.rst: WARNING: document isn't included in any toctree Documentation/arm/stm32/stm32f746-overview.rst: WARNING: document isn't included in any toctree Documentation/arm/stm32/stm32f769-overview.rst: WARNING: document isn't included in any toctree Documentation/arm/stm32/stm32h743-overview.rst: WARNING: document isn't included in any toctree Documentation/arm/stm32/stm32mp157-overview.rst: WARNING: document isn't included in any toctree Documentation/gpu/msm-crash-dump.rst: WARNING: document isn't included in any toctree Documentation/interconnect/interconnect.rst: WARNING: document isn't included in any toctree Documentation/laptops/lg-laptop.rst: WARNING: document isn't included in any toctree Documentation/powerpc/isa-versions.rst: WARNING: document isn't included in any toctree Documentation/virtual/kvm/amd-memory-encryption.rst: WARNING: document isn't included in any toctree Documentation/virtual/kvm/vcpu-requests.rst: WARNING: document isn't included in any toctree So, while they aren't on any toctree, add :orphan: to them, in order to silent this warning. Signed-off-by: Mauro Carvalho Chehab Acked-by: Andrew Donnellan Signed-off-by: Jonathan Corbet --- Documentation/accelerators/ocxl.rst | 2 ++ Documentation/arm/stm32/overview.rst | 2 ++ Documentation/arm/stm32/stm32f429-overview.rst | 2 ++ Documentation/arm/stm32/stm32f746-overview.rst | 2 ++ Documentation/arm/stm32/stm32f769-overview.rst | 2 ++ Documentation/arm/stm32/stm32h743-overview.rst | 2 ++ Documentation/arm/stm32/stm32mp157-overview.rst | 2 ++ Documentation/gpu/msm-crash-dump.rst | 2 ++ Documentation/interconnect/interconnect.rst | 2 ++ Documentation/laptops/lg-laptop.rst | 2 ++ Documentation/powerpc/isa-versions.rst | 2 ++ 11 files changed, 22 insertions(+) diff --git a/Documentation/accelerators/ocxl.rst b/Documentation/accelerators/ocxl.rst index 14cefc020e2d..b1cea19a90f5 100644 --- a/Documentation/accelerators/ocxl.rst +++ b/Documentation/accelerators/ocxl.rst @@ -1,3 +1,5 @@ +:orphan: + ======================================================== OpenCAPI (Open Coherent Accelerator Processor Interface) ======================================================== diff --git a/Documentation/arm/stm32/overview.rst b/Documentation/arm/stm32/overview.rst index 85cfc8410798..f7e734153860 100644 --- a/Documentation/arm/stm32/overview.rst +++ b/Documentation/arm/stm32/overview.rst @@ -1,3 +1,5 @@ +:orphan: + ======================== STM32 ARM Linux Overview ======================== diff --git a/Documentation/arm/stm32/stm32f429-overview.rst b/Documentation/arm/stm32/stm32f429-overview.rst index 18feda97f483..65bbb1c3b423 100644 --- a/Documentation/arm/stm32/stm32f429-overview.rst +++ b/Documentation/arm/stm32/stm32f429-overview.rst @@ -1,3 +1,5 @@ +:orphan: + STM32F429 Overview ================== diff --git a/Documentation/arm/stm32/stm32f746-overview.rst b/Documentation/arm/stm32/stm32f746-overview.rst index b5f4b6ce7656..42d593085015 100644 --- a/Documentation/arm/stm32/stm32f746-overview.rst +++ b/Documentation/arm/stm32/stm32f746-overview.rst @@ -1,3 +1,5 @@ +:orphan: + STM32F746 Overview ================== diff --git a/Documentation/arm/stm32/stm32f769-overview.rst b/Documentation/arm/stm32/stm32f769-overview.rst index 228656ced2fe..f6adac862b17 100644 --- a/Documentation/arm/stm32/stm32f769-overview.rst +++ b/Documentation/arm/stm32/stm32f769-overview.rst @@ -1,3 +1,5 @@ +:orphan: + STM32F769 Overview ================== diff --git a/Documentation/arm/stm32/stm32h743-overview.rst b/Documentation/arm/stm32/stm32h743-overview.rst index 3458dc00095d..c525835e7473 100644 --- a/Documentation/arm/stm32/stm32h743-overview.rst +++ b/Documentation/arm/stm32/stm32h743-overview.rst @@ -1,3 +1,5 @@ +:orphan: + STM32H743 Overview ================== diff --git a/Documentation/arm/stm32/stm32mp157-overview.rst b/Documentation/arm/stm32/stm32mp157-overview.rst index 62e176d47ca7..2c52cd020601 100644 --- a/Documentation/arm/stm32/stm32mp157-overview.rst +++ b/Documentation/arm/stm32/stm32mp157-overview.rst @@ -1,3 +1,5 @@ +:orphan: + STM32MP157 Overview =================== diff --git a/Documentation/gpu/msm-crash-dump.rst b/Documentation/gpu/msm-crash-dump.rst index 757cd257e0d8..240ef200f76c 100644 --- a/Documentation/gpu/msm-crash-dump.rst +++ b/Documentation/gpu/msm-crash-dump.rst @@ -1,3 +1,5 @@ +:orphan: + ===================== MSM Crash Dump Format ===================== diff --git a/Documentation/interconnect/interconnect.rst b/Documentation/interconnect/interconnect.rst index c3e004893796..56e331dab70e 100644 --- a/Documentation/interconnect/interconnect.rst +++ b/Documentation/interconnect/interconnect.rst @@ -1,5 +1,7 @@ .. SPDX-License-Identifier: GPL-2.0 +:orphan: + ===================================== GENERIC SYSTEM INTERCONNECT SUBSYSTEM ===================================== diff --git a/Documentation/laptops/lg-laptop.rst b/Documentation/laptops/lg-laptop.rst index aa503ee9b3bc..f2c2ffe31101 100644 --- a/Documentation/laptops/lg-laptop.rst +++ b/Documentation/laptops/lg-laptop.rst @@ -1,5 +1,7 @@ .. SPDX-License-Identifier: GPL-2.0+ +:orphan: + LG Gram laptop extra features ============================= diff --git a/Documentation/powerpc/isa-versions.rst b/Documentation/powerpc/isa-versions.rst index 812e20cc898c..66c24140ebf1 100644 --- a/Documentation/powerpc/isa-versions.rst +++ b/Documentation/powerpc/isa-versions.rst @@ -1,3 +1,5 @@ +:orphan: + CPU to ISA Version Mapping ========================== From f672febc3d132ea0487c63367455124dfa39e30f Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Fri, 7 Jun 2019 15:54:23 -0300 Subject: [PATCH 055/129] docs: amd-memory-encryption.rst get rid of warnings Get rid of those warnings: Documentation/virtual/kvm/amd-memory-encryption.rst:244: WARNING: Citation [white-paper] is not referenced. Documentation/virtual/kvm/amd-memory-encryption.rst:246: WARNING: Citation [amd-apm] is not referenced. Documentation/virtual/kvm/amd-memory-encryption.rst:247: WARNING: Citation [kvm-forum] is not referenced. For references that aren't mentioned at the text by adding an explicit reference to them. Signed-off-by: Mauro Carvalho Chehab Acked-by: Paolo Bonzini Signed-off-by: Jonathan Corbet --- Documentation/virtual/kvm/amd-memory-encryption.rst | 3 +++ 1 file changed, 3 insertions(+) diff --git a/Documentation/virtual/kvm/amd-memory-encryption.rst b/Documentation/virtual/kvm/amd-memory-encryption.rst index 659bbc093b52..d18c97b4e140 100644 --- a/Documentation/virtual/kvm/amd-memory-encryption.rst +++ b/Documentation/virtual/kvm/amd-memory-encryption.rst @@ -241,6 +241,9 @@ Returns: 0 on success, -negative on error References ========== + +See [white-paper]_, [api-spec]_, [amd-apm]_ and [kvm-forum]_ for more info. + .. [white-paper] http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_Memory_Encryption_Whitepaper_v7-Public.pdf .. [api-spec] http://support.amd.com/TechDocs/55766_SEV-KM_API_Specification.pdf .. [amd-apm] http://support.amd.com/TechDocs/24593.pdf (section 15.34) From d0727cc650f38243c0ac63fd8c91bfd63e3e2578 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Fri, 7 Jun 2019 15:54:24 -0300 Subject: [PATCH 056/129] docs: zh_CN: avoid duplicate citation references Documentation/process/management-style.rst:35: WARNING: duplicate label decisions, other instance in Documentation/translations/zh_CN/process/management-style.rst Documentation/process/programming-language.rst:37: WARNING: duplicate citation c-language, other instance in Documentation/translations/zh_CN/process/programming-language.rst Documentation/process/programming-language.rst:38: WARNING: duplicate citation gcc, other instance in Documentation/translations/zh_CN/process/programming-language.rst Documentation/process/programming-language.rst:39: WARNING: duplicate citation clang, other instance in Documentation/translations/zh_CN/process/programming-language.rst Documentation/process/programming-language.rst:40: WARNING: duplicate citation icc, other instance in Documentation/translations/zh_CN/process/programming-language.rst Documentation/process/programming-language.rst:41: WARNING: duplicate citation gcc-c-dialect-options, other instance in Documentation/translations/zh_CN/process/programming-language.rst Documentation/process/programming-language.rst:42: WARNING: duplicate citation gnu-extensions, other instance in Documentation/translations/zh_CN/process/programming-language.rst Documentation/process/programming-language.rst:43: WARNING: duplicate citation gcc-attribute-syntax, other instance in Documentation/translations/zh_CN/process/programming-language.rst Documentation/process/programming-language.rst:44: WARNING: duplicate citation n2049, other instance in Documentation/translations/zh_CN/process/programming-language.rst Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Jonathan Corbet --- .../zh_CN/process/management-style.rst | 4 +- .../zh_CN/process/programming-language.rst | 59 ++++++++++++++----- 2 files changed, 47 insertions(+), 16 deletions(-) diff --git a/Documentation/translations/zh_CN/process/management-style.rst b/Documentation/translations/zh_CN/process/management-style.rst index a181fa56d19e..c6a5bb285797 100644 --- a/Documentation/translations/zh_CN/process/management-style.rst +++ b/Documentation/translations/zh_CN/process/management-style.rst @@ -28,7 +28,7 @@ Linux内核管理风格 不管怎样,这里是: -.. _decisions: +.. _cn_decisions: 1)决策 ------- @@ -108,7 +108,7 @@ Linux内核管理风格 但是,为了做好作为内核管理者的准备,最好记住不要烧掉任何桥梁,不要轰炸任何 无辜的村民,也不要疏远太多的内核开发人员。事实证明,疏远人是相当容易的,而 亲近一个疏远的人是很难的。因此,“疏远”立即属于“不可逆”的范畴,并根据 -:ref:`decisions` 成为绝不可以做的事情。 +:ref:`cn_decisions` 成为绝不可以做的事情。 这里只有几个简单的规则: diff --git a/Documentation/translations/zh_CN/process/programming-language.rst b/Documentation/translations/zh_CN/process/programming-language.rst index 51fd4ef48ea1..2a47a1d2ec20 100644 --- a/Documentation/translations/zh_CN/process/programming-language.rst +++ b/Documentation/translations/zh_CN/process/programming-language.rst @@ -8,21 +8,21 @@ 程序设计语言 ============ -内核是用C语言 [c-language]_ 编写的。更准确地说,内核通常是用 ``gcc`` [gcc]_ -在 ``-std=gnu89`` [gcc-c-dialect-options]_ 下编译的:ISO C90的 GNU 方言( +内核是用C语言 :ref:`c-language ` 编写的。更准确地说,内核通常是用 :ref:`gcc ` +在 ``-std=gnu89`` :ref:`gcc-c-dialect-options ` 下编译的:ISO C90的 GNU 方言( 包括一些C99特性) -这种方言包含对语言 [gnu-extensions]_ 的许多扩展,当然,它们许多都在内核中使用。 +这种方言包含对语言 :ref:`gnu-extensions ` 的许多扩展,当然,它们许多都在内核中使用。 -对于一些体系结构,有一些使用 ``clang`` [clang]_ 和 ``icc`` [icc]_ 编译内核 +对于一些体系结构,有一些使用 :ref:`clang ` 和 :ref:`icc ` 编译内核 的支持,尽管在编写此文档时还没有完成,仍需要第三方补丁。 属性 ---- -在整个内核中使用的一个常见扩展是属性(attributes) [gcc-attribute-syntax]_ +在整个内核中使用的一个常见扩展是属性(attributes) :ref:`gcc-attribute-syntax ` 属性允许将实现定义的语义引入语言实体(如变量、函数或类型),而无需对语言进行 -重大的语法更改(例如添加新关键字) [n2049]_ +重大的语法更改(例如添加新关键字) :ref:`n2049 ` 在某些情况下,属性是可选的(即不支持这些属性的编译器仍然应该生成正确的代码, 即使其速度较慢或执行的编译时检查/诊断次数不够) @@ -31,11 +31,42 @@ ``__attribute__((__pure__))`` ),以检测可以使用哪些关键字和/或缩短代码, 具体 请参阅 ``include/linux/compiler_attributes.h`` -.. [c-language] http://www.open-std.org/jtc1/sc22/wg14/www/standards -.. [gcc] https://gcc.gnu.org -.. [clang] https://clang.llvm.org -.. [icc] https://software.intel.com/en-us/c-compilers -.. [gcc-c-dialect-options] https://gcc.gnu.org/onlinedocs/gcc/C-Dialect-Options.html -.. [gnu-extensions] https://gcc.gnu.org/onlinedocs/gcc/C-Extensions.html -.. [gcc-attribute-syntax] https://gcc.gnu.org/onlinedocs/gcc/Attribute-Syntax.html -.. [n2049] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2049.pdf +.. _cn_c-language: + +c-language + http://www.open-std.org/jtc1/sc22/wg14/www/standards + +.. _cn_gcc: + +gcc + https://gcc.gnu.org + +.. _cn_clang: + +clang + https://clang.llvm.org + +.. _cn_icc: + +icc + https://software.intel.com/en-us/c-compilers + +.. _cn_gcc-c-dialect-options: + +c-dialect-options + https://gcc.gnu.org/onlinedocs/gcc/C-Dialect-Options.html + +.. _cn_gnu-extensions: + +gnu-extensions + https://gcc.gnu.org/onlinedocs/gcc/C-Extensions.html + +.. _cn_gcc-attribute-syntax: + +gcc-attribute-syntax + https://gcc.gnu.org/onlinedocs/gcc/Attribute-Syntax.html + +.. _cn_n2049: + +n2049 + http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2049.pdf From ea0ad8763b17395fc611f6d91d1de389ec0cc584 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Fri, 7 Jun 2019 15:54:25 -0300 Subject: [PATCH 057/129] docs: it: license-rules.rst: get rid of warnings There's a wrong identation on a code block, and it tries to use a reference that was not defined at the Italian translation. Documentation/translations/it_IT/process/license-rules.rst:329: WARNING: Literal block expected; none found. Documentation/translations/it_IT/process/license-rules.rst:332: WARNING: Unexpected indentation. Documentation/translations/it_IT/process/license-rules.rst:339: WARNING: Block quote ends without a blank line; unexpected unindent. Documentation/translations/it_IT/process/license-rules.rst:341: WARNING: Unexpected indentation. Documentation/translations/it_IT/process/license-rules.rst:305: WARNING: Unknown target name: "metatags". Signed-off-by: Mauro Carvalho Chehab Reviewed-by: Federico Vaga Signed-off-by: Jonathan Corbet --- .../it_IT/process/license-rules.rst | 28 +++++++++---------- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/Documentation/translations/it_IT/process/license-rules.rst b/Documentation/translations/it_IT/process/license-rules.rst index f058e06996dc..4cd87a3a7bf9 100644 --- a/Documentation/translations/it_IT/process/license-rules.rst +++ b/Documentation/translations/it_IT/process/license-rules.rst @@ -303,7 +303,7 @@ essere categorizzate in: LICENSES/dual I file in questa cartella contengono il testo completo della rispettiva - licenza e i suoi `Metatags`_. I nomi dei file sono identici agli + licenza e i suoi `Metatag`_. I nomi dei file sono identici agli identificatori di licenza SPDX che dovrebbero essere usati nei file sorgenti. @@ -326,19 +326,19 @@ essere categorizzate in: Esempio del formato del file:: - Valid-License-Identifier: MPL-1.1 - SPDX-URL: https://spdx.org/licenses/MPL-1.1.html - Usage-Guide: - Do NOT use. The MPL-1.1 is not GPL2 compatible. It may only be used for - dual-licensed files where the other license is GPL2 compatible. - If you end up using this it MUST be used together with a GPL2 compatible - license using "OR". - To use the Mozilla Public License version 1.1 put the following SPDX - tag/value pair into a comment according to the placement guidelines in - the licensing rules documentation: - SPDX-License-Identifier: MPL-1.1 - License-Text: - Full license text + Valid-License-Identifier: MPL-1.1 + SPDX-URL: https://spdx.org/licenses/MPL-1.1.html + Usage-Guide: + Do NOT use. The MPL-1.1 is not GPL2 compatible. It may only be used for + dual-licensed files where the other license is GPL2 compatible. + If you end up using this it MUST be used together with a GPL2 compatible + license using "OR". + To use the Mozilla Public License version 1.1 put the following SPDX + tag/value pair into a comment according to the placement guidelines in + the licensing rules documentation: + SPDX-License-Identifier: MPL-1.1 + License-Text: + Full license text | From 6ad8b21652ec26a5ad51ffc91470e15c19156548 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Fri, 7 Jun 2019 15:54:27 -0300 Subject: [PATCH 058/129] docs: security: trusted-encrypted.rst: fix code-block tag The code-block tag is at the wrong place, causing those warnings: Documentation/security/keys/trusted-encrypted.rst:112: WARNING: Literal block expected; none found. Documentation/security/keys/trusted-encrypted.rst:121: WARNING: Unexpected indentation. Documentation/security/keys/trusted-encrypted.rst:122: WARNING: Block quote ends without a blank line; unexpected unindent. Documentation/security/keys/trusted-encrypted.rst:123: WARNING: Block quote ends without a blank line; unexpected unindent. Signed-off-by: Mauro Carvalho Chehab Acked-by: James Morris Acked-by: Jarkko Sakkinen Signed-off-by: Jonathan Corbet --- Documentation/security/keys/trusted-encrypted.rst | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/Documentation/security/keys/trusted-encrypted.rst b/Documentation/security/keys/trusted-encrypted.rst index 7b35fcb58933..50ac8bcd6970 100644 --- a/Documentation/security/keys/trusted-encrypted.rst +++ b/Documentation/security/keys/trusted-encrypted.rst @@ -107,12 +107,14 @@ Where:: Examples of trusted and encrypted key usage: -Create and save a trusted key named "kmk" of length 32 bytes:: +Create and save a trusted key named "kmk" of length 32 bytes. Note: When using a TPM 2.0 with a persistent key with handle 0x81000001, append 'keyhandle=0x81000001' to statements between quotes, such as "new 32 keyhandle=0x81000001". +:: + $ keyctl add trusted kmk "new 32" @u 440502848 From 43415f13276f09623b1b61376c6f2e43f71bedbb Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Fri, 7 Jun 2019 15:54:28 -0300 Subject: [PATCH 059/129] docs: security: core.rst: Fix several warnings Multi-line literal markups only work when they're idented at the same level, with is not the case here: Documentation/security/keys/core.rst:1597: WARNING: Inline literal start-string without end-string. Documentation/security/keys/core.rst:1597: WARNING: Inline emphasis start-string without end-string. Documentation/security/keys/core.rst:1597: WARNING: Inline emphasis start-string without end-string. Documentation/security/keys/core.rst:1598: WARNING: Inline emphasis start-string without end-string. Documentation/security/keys/core.rst:1598: WARNING: Inline emphasis start-string without end-string. Documentation/security/keys/core.rst:1600: WARNING: Inline literal start-string without end-string. Documentation/security/keys/core.rst:1600: WARNING: Inline emphasis start-string without end-string. Documentation/security/keys/core.rst:1600: WARNING: Inline emphasis start-string without end-string. Documentation/security/keys/core.rst:1600: WARNING: Inline emphasis start-string without end-string. Documentation/security/keys/core.rst:1600: WARNING: Inline emphasis start-string without end-string. Documentation/security/keys/core.rst:1666: WARNING: Inline literal start-string without end-string. Documentation/security/keys/core.rst:1666: WARNING: Inline emphasis start-string without end-string. Documentation/security/keys/core.rst:1666: WARNING: Inline emphasis start-string without end-string. Documentation/security/keys/core.rst:1666: WARNING: Inline emphasis start-string without end-string. Fix it by using a code-block instead. Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Jonathan Corbet --- Documentation/security/keys/core.rst | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/Documentation/security/keys/core.rst b/Documentation/security/keys/core.rst index 9521c4207f01..3fd60dcb2dc6 100644 --- a/Documentation/security/keys/core.rst +++ b/Documentation/security/keys/core.rst @@ -1594,10 +1594,12 @@ The structure has a number of fields, some of which are mandatory: attempted key link operation. If there is no match, -EINVAL is returned. - * ``int (*asym_eds_op)(struct kernel_pkey_params *params, - const void *in, void *out);`` - ``int (*asym_verify_signature)(struct kernel_pkey_params *params, - const void *in, const void *in2);`` + * ``asym_eds_op`` and ``asym_verify_signature``:: + + int (*asym_eds_op)(struct kernel_pkey_params *params, + const void *in, void *out); + int (*asym_verify_signature)(struct kernel_pkey_params *params, + const void *in, const void *in2); These methods are optional. If provided the first allows a key to be used to encrypt, decrypt or sign a blob of data, and the second allows a @@ -1662,8 +1664,10 @@ The structure has a number of fields, some of which are mandatory: required crypto isn't available. - * ``int (*asym_query)(const struct kernel_pkey_params *params, - struct kernel_pkey_query *info);`` + * ``asym_query``:: + + int (*asym_query)(const struct kernel_pkey_params *params, + struct kernel_pkey_query *info); This method is optional. If provided it allows information about the public or asymmetric key held in the key to be determined. From c6fff4d3b2f467dd62ee8c69e49c8a8795fe7400 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Fri, 7 Jun 2019 15:54:30 -0300 Subject: [PATCH 060/129] docs: net: sja1105.rst: fix table format There's a table there with produces two warnings when built with Sphinx: Documentation/networking/dsa/sja1105.rst:91: WARNING: Block quote ends without a blank line; unexpected unindent. Documentation/networking/dsa/sja1105.rst:91: WARNING: Block quote ends without a blank line; unexpected unindent. It will still produce a table, but the html output is wrong, as it won't interpret the second line as the continuation for the first ones, because identation doesn't match. After the change, the output looks a way better and we got rid of two warnings. Signed-off-by: Mauro Carvalho Chehab Acked-by: Vladimir Oltean Signed-off-by: Jonathan Corbet --- Documentation/networking/dsa/sja1105.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/Documentation/networking/dsa/sja1105.rst b/Documentation/networking/dsa/sja1105.rst index ea7bac438cfd..cb2858dece93 100644 --- a/Documentation/networking/dsa/sja1105.rst +++ b/Documentation/networking/dsa/sja1105.rst @@ -86,13 +86,13 @@ functionality. The following traffic modes are supported over the switch netdevices: +--------------------+------------+------------------+------------------+ -| | Standalone | Bridged with | Bridged with | -| | ports | vlan_filtering 0 | vlan_filtering 1 | +| | Standalone | Bridged with | Bridged with | +| | ports | vlan_filtering 0 | vlan_filtering 1 | +====================+============+==================+==================+ | Regular traffic | Yes | Yes | No (use master) | +--------------------+------------+------------------+------------------+ | Management traffic | Yes | Yes | Yes | -| (BPDU, PTP) | | | | +| (BPDU, PTP) | | | | +--------------------+------------+------------------+------------------+ Switching features From 14b767430a58046bfef8ff9b9f12854e20343092 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Fri, 7 Jun 2019 15:54:29 -0300 Subject: [PATCH 061/129] docs: net: dpio-driver.rst: fix two codeblock warnings Documentation/networking/device_drivers/freescale/dpaa2/dpio-driver.rst:43: WARNING: Definition list ends without a blank line; unexpected unindent. Documentation/networking/device_drivers/freescale/dpaa2/dpio-driver.rst:63: WARNING: Unexpected indentation. looking for now-outdated files... none found Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Jonathan Corbet --- .../networking/device_drivers/freescale/dpaa2/dpio-driver.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Documentation/networking/device_drivers/freescale/dpaa2/dpio-driver.rst b/Documentation/networking/device_drivers/freescale/dpaa2/dpio-driver.rst index 5045df990a4c..17dbee1ac53e 100644 --- a/Documentation/networking/device_drivers/freescale/dpaa2/dpio-driver.rst +++ b/Documentation/networking/device_drivers/freescale/dpaa2/dpio-driver.rst @@ -39,8 +39,7 @@ The Linux DPIO driver consists of 3 primary components-- DPIO service-- provides APIs to other Linux drivers for services - QBman portal interface-- sends portal commands, gets responses -:: + QBman portal interface-- sends portal commands, gets responses:: fsl-mc other bus drivers @@ -60,6 +59,7 @@ The Linux DPIO driver consists of 3 primary components-- The diagram below shows how the DPIO driver components fit with the other DPAA2 Linux driver components:: + +------------+ | OS Network | | Stack | From 1eecbcdca2bd8d96881cace19ad105dc0f0263f5 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Fri, 7 Jun 2019 15:54:31 -0300 Subject: [PATCH 062/129] docs: move protection-keys.rst to the core-api book This document is used by multiple architectures: $ echo $(git grep -l pkey_mprotect arch|cut -d'/' -f 2|sort|uniq) alpha arm arm64 ia64 m68k microblaze mips parisc powerpc s390 sh sparc x86 xtensa So, let's move it to the core book and adjust the links to it accordingly. Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Jonathan Corbet --- Documentation/core-api/index.rst | 1 + Documentation/{x86 => core-api}/protection-keys.rst | 0 Documentation/x86/index.rst | 1 - arch/powerpc/Kconfig | 2 +- arch/x86/Kconfig | 2 +- tools/testing/selftests/x86/protection_keys.c | 2 +- 6 files changed, 4 insertions(+), 4 deletions(-) rename Documentation/{x86 => core-api}/protection-keys.rst (100%) diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst index ee1bb8983a88..2466a4c51031 100644 --- a/Documentation/core-api/index.rst +++ b/Documentation/core-api/index.rst @@ -34,6 +34,7 @@ Core utilities timekeeping boot-time-mm memory-hotplug + protection-keys Interfaces for kernel debugging diff --git a/Documentation/x86/protection-keys.rst b/Documentation/core-api/protection-keys.rst similarity index 100% rename from Documentation/x86/protection-keys.rst rename to Documentation/core-api/protection-keys.rst diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst index ae36fc5fc649..f2de1b2d3ac7 100644 --- a/Documentation/x86/index.rst +++ b/Documentation/x86/index.rst @@ -19,7 +19,6 @@ x86-specific Documentation tlb mtrr pat - protection-keys intel_mpx amd-memory-encryption pti diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 8c1c636308c8..3b795a0cab62 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -898,7 +898,7 @@ config PPC_MEM_KEYS page-based protections, but without requiring modification of the page tables when an application changes protection domains. - For details, see Documentation/vm/protection-keys.rst + For details, see Documentation/core-api/protection-keys.rst If unsure, say y. diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 2bbbd4d1ba31..d87d53fcd261 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1911,7 +1911,7 @@ config X86_INTEL_MEMORY_PROTECTION_KEYS page-based protections, but without requiring modification of the page tables when an application changes protection domains. - For details, see Documentation/x86/protection-keys.txt + For details, see Documentation/core-api/protection-keys.rst If unsure, say y. diff --git a/tools/testing/selftests/x86/protection_keys.c b/tools/testing/selftests/x86/protection_keys.c index 5d546dcdbc80..480995bceefa 100644 --- a/tools/testing/selftests/x86/protection_keys.c +++ b/tools/testing/selftests/x86/protection_keys.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0 /* - * Tests x86 Memory Protection Keys (see Documentation/x86/protection-keys.txt) + * Tests x86 Memory Protection Keys (see Documentation/core-api/protection-keys.rst) * * There are examples in here of: * * how to set protection keys on memory From cb1aaebea8d79860181559d7b5d482aea63db113 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Fri, 7 Jun 2019 15:54:32 -0300 Subject: [PATCH 063/129] docs: fix broken documentation links Mostly due to x86 and acpi conversion, several documentation links are still pointing to the old file. Fix them. Signed-off-by: Mauro Carvalho Chehab Reviewed-by: Wolfram Sang Reviewed-by: Sven Van Asbroeck Reviewed-by: Bhupesh Sharma Acked-by: Mark Brown Signed-off-by: Jonathan Corbet --- Documentation/acpi/dsd/leds.txt | 2 +- Documentation/admin-guide/kernel-parameters.rst | 6 +++--- Documentation/admin-guide/kernel-parameters.txt | 16 ++++++++-------- Documentation/admin-guide/ras.rst | 2 +- .../devicetree/bindings/net/fsl-enetc.txt | 7 +++---- .../bindings/pci/amlogic,meson-pcie.txt | 2 +- .../bindings/regulator/qcom,rpmh-regulator.txt | 2 +- Documentation/devicetree/booting-without-of.txt | 2 +- Documentation/driver-api/gpio/board.rst | 2 +- Documentation/driver-api/gpio/consumer.rst | 2 +- .../firmware-guide/acpi/enumeration.rst | 2 +- .../firmware-guide/acpi/method-tracing.rst | 2 +- Documentation/i2c/instantiating-devices | 2 +- Documentation/sysctl/kernel.txt | 4 ++-- .../translations/zh_CN/process/4.Coding.rst | 2 +- Documentation/x86/x86_64/5level-paging.rst | 2 +- Documentation/x86/x86_64/boot-options.rst | 4 ++-- .../x86/x86_64/fake-numa-for-cpusets.rst | 2 +- MAINTAINERS | 4 ++-- arch/arm/Kconfig | 2 +- arch/arm64/kernel/kexec_image.c | 2 +- arch/x86/Kconfig | 14 +++++++------- arch/x86/Kconfig.debug | 2 +- arch/x86/boot/header.S | 2 +- arch/x86/entry/entry_64.S | 2 +- arch/x86/include/asm/bootparam_utils.h | 2 +- arch/x86/include/asm/page_64_types.h | 2 +- arch/x86/include/asm/pgtable_64_types.h | 2 +- arch/x86/kernel/cpu/microcode/amd.c | 2 +- arch/x86/kernel/kexec-bzimage64.c | 2 +- arch/x86/kernel/pci-dma.c | 2 +- arch/x86/mm/tlb.c | 2 +- arch/x86/platform/pvh/enlighten.c | 2 +- drivers/acpi/Kconfig | 10 +++++----- drivers/net/ethernet/faraday/ftgmac100.c | 2 +- .../fieldbus/Documentation/fieldbus_dev.txt | 4 ++-- drivers/vhost/vhost.c | 2 +- include/acpi/acpi_drivers.h | 2 +- include/linux/fs_context.h | 2 +- include/linux/lsm_hooks.h | 2 +- mm/Kconfig | 2 +- security/Kconfig | 2 +- tools/include/linux/err.h | 2 +- tools/objtool/Documentation/stack-validation.txt | 4 ++-- 44 files changed, 70 insertions(+), 71 deletions(-) diff --git a/Documentation/acpi/dsd/leds.txt b/Documentation/acpi/dsd/leds.txt index 81a63af42ed2..cc58b1a574c5 100644 --- a/Documentation/acpi/dsd/leds.txt +++ b/Documentation/acpi/dsd/leds.txt @@ -96,4 +96,4 @@ where , referenced 2019-02-21. -[7] Documentation/acpi/dsd/data-node-reference.txt +[7] Documentation/firmware-guide/acpi/dsd/data-node-references.rst diff --git a/Documentation/admin-guide/kernel-parameters.rst b/Documentation/admin-guide/kernel-parameters.rst index 0124980dca2d..8d3273e32eb1 100644 --- a/Documentation/admin-guide/kernel-parameters.rst +++ b/Documentation/admin-guide/kernel-parameters.rst @@ -167,7 +167,7 @@ parameter is applicable:: X86-32 X86-32, aka i386 architecture is enabled. X86-64 X86-64 architecture is enabled. More X86-64 boot options can be found in - Documentation/x86/x86_64/boot-options.txt . + Documentation/x86/x86_64/boot-options.rst. X86 Either 32-bit or 64-bit x86 (same as X86-32+X86-64) X86_UV SGI UV support is enabled. XEN Xen support is enabled @@ -181,10 +181,10 @@ In addition, the following text indicates that the option:: Parameters denoted with BOOT are actually interpreted by the boot loader, and have no meaning to the kernel directly. Do not modify the syntax of boot loader parameters without extreme -need or coordination with . +need or coordination with . There are also arch-specific kernel-parameters not documented here. -See for example . +See for example . Note that ALL kernel parameters listed below are CASE SENSITIVE, and that a trailing = on the name of any parameter states that that parameter will diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 79d043b8850d..1abd7e145357 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -53,7 +53,7 @@ ACPI_DEBUG_PRINT statements, e.g., ACPI_DEBUG_PRINT((ACPI_DB_INFO, ... The debug_level mask defaults to "info". See - Documentation/acpi/debug.txt for more information about + Documentation/firmware-guide/acpi/debug.rst for more information about debug layers and levels. Enable processor driver info messages: @@ -963,7 +963,7 @@ for details. nompx [X86] Disables Intel Memory Protection Extensions. - See Documentation/x86/intel_mpx.txt for more + See Documentation/x86/intel_mpx.rst for more information about the feature. nopku [X86] Disable Memory Protection Keys CPU feature found @@ -1189,7 +1189,7 @@ that is to be dynamically loaded by Linux. If there are multiple variables with the same name but with different vendor GUIDs, all of them will be loaded. See - Documentation/acpi/ssdt-overlays.txt for details. + Documentation/admin-guide/acpi/ssdt-overlays.rst for details. eisa_irq_edge= [PARISC,HW] @@ -2383,7 +2383,7 @@ mce [X86-32] Machine Check Exception - mce=option [X86-64] See Documentation/x86/x86_64/boot-options.txt + mce=option [X86-64] See Documentation/x86/x86_64/boot-options.rst md= [HW] RAID subsystems devices and level See Documentation/admin-guide/md.rst. @@ -2439,7 +2439,7 @@ set according to the CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE kernel config option. - See Documentation/memory-hotplug.txt. + See Documentation/admin-guide/mm/memory-hotplug.rst. memmap=exactmap [KNL,X86] Enable setting of an exact E820 memory map, as specified by the user. @@ -2528,7 +2528,7 @@ mem_encrypt=on: Activate SME mem_encrypt=off: Do not activate SME - Refer to Documentation/x86/amd-memory-encryption.txt + Refer to Documentation/virtual/kvm/amd-memory-encryption.rst for details on when memory encryption can be activated. mem_sleep_default= [SUSPEND] Default system suspend mode: @@ -3529,7 +3529,7 @@ See Documentation/blockdev/paride.txt. pirq= [SMP,APIC] Manual mp-table setup - See Documentation/x86/i386/IO-APIC.txt. + See Documentation/x86/i386/IO-APIC.rst. plip= [PPT,NET] Parallel port network link Format: { parport | timid | 0 } @@ -5055,7 +5055,7 @@ Can be used multiple times for multiple devices. vga= [BOOT,X86-32] Select a particular video mode - See Documentation/x86/boot.txt and + See Documentation/x86/boot.rst and Documentation/svga.txt. Use vga=ask for menu. This is actually a boot loader parameter; the value is diff --git a/Documentation/admin-guide/ras.rst b/Documentation/admin-guide/ras.rst index c7495e42e6f4..2b20f5f7380d 100644 --- a/Documentation/admin-guide/ras.rst +++ b/Documentation/admin-guide/ras.rst @@ -199,7 +199,7 @@ Architecture (MCA)\ [#f3]_. mode). .. [#f3] For more details about the Machine Check Architecture (MCA), - please read Documentation/x86/x86_64/machinecheck at the Kernel tree. + please read Documentation/x86/x86_64/machinecheck.rst at the Kernel tree. EDAC - Error Detection And Correction ************************************* diff --git a/Documentation/devicetree/bindings/net/fsl-enetc.txt b/Documentation/devicetree/bindings/net/fsl-enetc.txt index c812e25ae90f..25fc687419db 100644 --- a/Documentation/devicetree/bindings/net/fsl-enetc.txt +++ b/Documentation/devicetree/bindings/net/fsl-enetc.txt @@ -16,8 +16,8 @@ Required properties: In this case, the ENETC node should include a "mdio" sub-node that in turn should contain the "ethernet-phy" node describing the external phy. Below properties are required, their bindings -already defined in ethernet.txt or phy.txt, under -Documentation/devicetree/bindings/net/*. +already defined in Documentation/devicetree/bindings/net/ethernet.txt or +Documentation/devicetree/bindings/net/phy.txt. Required: @@ -51,8 +51,7 @@ Example: connection: In this case, the ENETC port node defines a fixed link connection, -as specified by "fixed-link.txt", under -Documentation/devicetree/bindings/net/*. +as specified by Documentation/devicetree/bindings/net/fixed-link.txt. Required: diff --git a/Documentation/devicetree/bindings/pci/amlogic,meson-pcie.txt b/Documentation/devicetree/bindings/pci/amlogic,meson-pcie.txt index 12b18f82d441..efa2c8b9b85a 100644 --- a/Documentation/devicetree/bindings/pci/amlogic,meson-pcie.txt +++ b/Documentation/devicetree/bindings/pci/amlogic,meson-pcie.txt @@ -3,7 +3,7 @@ Amlogic Meson AXG DWC PCIE SoC controller Amlogic Meson PCIe host controller is based on the Synopsys DesignWare PCI core. It shares common functions with the PCIe DesignWare core driver and inherits common properties defined in -Documentation/devicetree/bindings/pci/designware-pci.txt. +Documentation/devicetree/bindings/pci/designware-pcie.txt. Additional properties are described here: diff --git a/Documentation/devicetree/bindings/regulator/qcom,rpmh-regulator.txt b/Documentation/devicetree/bindings/regulator/qcom,rpmh-regulator.txt index 7ef2dbe48e8a..14d2eee96b3d 100644 --- a/Documentation/devicetree/bindings/regulator/qcom,rpmh-regulator.txt +++ b/Documentation/devicetree/bindings/regulator/qcom,rpmh-regulator.txt @@ -97,7 +97,7 @@ Second Level Nodes - Regulators sent for this regulator including those which are for a strictly lower power state. -Other properties defined in Documentation/devicetree/bindings/regulator.txt +Other properties defined in Documentation/devicetree/bindings/regulator/regulator.txt may also be used. regulator-initial-mode and regulator-allowed-modes may be specified for VRM regulators using mode values from include/dt-bindings/regulator/qcom,rpmh-regulator.h. regulator-allow-bypass diff --git a/Documentation/devicetree/booting-without-of.txt b/Documentation/devicetree/booting-without-of.txt index e86bd2f64117..60f8640f2b2f 100644 --- a/Documentation/devicetree/booting-without-of.txt +++ b/Documentation/devicetree/booting-without-of.txt @@ -277,7 +277,7 @@ it with special cases. the decompressor (the real mode entry point goes to the same 32bit entry point once it switched into protected mode). That entry point supports one calling convention which is documented in - Documentation/x86/boot.txt + Documentation/x86/boot.rst The physical pointer to the device-tree block (defined in chapter II) is passed via setup_data which requires at least boot protocol 2.09. The type filed is defined as diff --git a/Documentation/driver-api/gpio/board.rst b/Documentation/driver-api/gpio/board.rst index b37f3f7b8926..ce91518bf9f4 100644 --- a/Documentation/driver-api/gpio/board.rst +++ b/Documentation/driver-api/gpio/board.rst @@ -101,7 +101,7 @@ with the help of _DSD (Device Specific Data), introduced in ACPI 5.1:: } For more information about the ACPI GPIO bindings see -Documentation/acpi/gpio-properties.txt. +Documentation/firmware-guide/acpi/gpio-properties.rst. Platform Data ------------- diff --git a/Documentation/driver-api/gpio/consumer.rst b/Documentation/driver-api/gpio/consumer.rst index 5e4d8aa68913..fdecb6d711db 100644 --- a/Documentation/driver-api/gpio/consumer.rst +++ b/Documentation/driver-api/gpio/consumer.rst @@ -437,7 +437,7 @@ case, it will be handled by the GPIO subsystem automatically. However, if the _DSD is not present, the mappings between GpioIo()/GpioInt() resources and GPIO connection IDs need to be provided by device drivers. -For details refer to Documentation/acpi/gpio-properties.txt +For details refer to Documentation/firmware-guide/acpi/gpio-properties.rst Interacting With the Legacy GPIO Subsystem diff --git a/Documentation/firmware-guide/acpi/enumeration.rst b/Documentation/firmware-guide/acpi/enumeration.rst index 850be9696931..1252617b520f 100644 --- a/Documentation/firmware-guide/acpi/enumeration.rst +++ b/Documentation/firmware-guide/acpi/enumeration.rst @@ -339,7 +339,7 @@ a code like this:: There are also devm_* versions of these functions which release the descriptors once the device is released. -See Documentation/acpi/gpio-properties.txt for more information about the +See Documentation/firmware-guide/acpi/gpio-properties.rst for more information about the _DSD binding related to GPIOs. MFD devices diff --git a/Documentation/firmware-guide/acpi/method-tracing.rst b/Documentation/firmware-guide/acpi/method-tracing.rst index d0b077b73f5f..0aa7e2c5d32a 100644 --- a/Documentation/firmware-guide/acpi/method-tracing.rst +++ b/Documentation/firmware-guide/acpi/method-tracing.rst @@ -68,7 +68,7 @@ c. Filter out the debug layer/level matched logs when the specified Where: 0xXXXXXXXX/0xYYYYYYYY - Refer to Documentation/acpi/debug.txt for possible debug layer/level + Refer to Documentation/firmware-guide/acpi/debug.rst for possible debug layer/level masking values. \PPPP.AAAA.TTTT.HHHH Full path of a control method that can be found in the ACPI namespace. diff --git a/Documentation/i2c/instantiating-devices b/Documentation/i2c/instantiating-devices index 0d85ac1935b7..5a3e2f331e8c 100644 --- a/Documentation/i2c/instantiating-devices +++ b/Documentation/i2c/instantiating-devices @@ -85,7 +85,7 @@ Method 1c: Declare the I2C devices via ACPI ------------------------------------------- ACPI can also describe I2C devices. There is special documentation for this -which is currently located at Documentation/acpi/enumeration.txt. +which is currently located at Documentation/firmware-guide/acpi/enumeration.rst. Method 2: Instantiate the devices explicitly diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index f0c86fbb3b48..92f7f34b021a 100644 --- a/Documentation/sysctl/kernel.txt +++ b/Documentation/sysctl/kernel.txt @@ -155,7 +155,7 @@ is 0x15 and the full version number is 0x234, this file will contain the value 340 = 0x154. See the type_of_loader and ext_loader_type fields in -Documentation/x86/boot.txt for additional information. +Documentation/x86/boot.rst for additional information. ============================================================== @@ -167,7 +167,7 @@ The complete bootloader version number. In the example above, this file will contain the value 564 = 0x234. See the type_of_loader and ext_loader_ver fields in -Documentation/x86/boot.txt for additional information. +Documentation/x86/boot.rst for additional information. ============================================================== diff --git a/Documentation/translations/zh_CN/process/4.Coding.rst b/Documentation/translations/zh_CN/process/4.Coding.rst index 5301e9d55255..8bb777941394 100644 --- a/Documentation/translations/zh_CN/process/4.Coding.rst +++ b/Documentation/translations/zh_CN/process/4.Coding.rst @@ -241,7 +241,7 @@ scripts/coccinelle目录下已经打包了相当多的内核“语义补丁” 任何添加新用户空间界面的代码(包括新的sysfs或/proc文件)都应该包含该界面的 文档,该文档使用户空间开发人员能够知道他们在使用什么。请参阅 -Documentation/abi/readme,了解如何格式化此文档以及需要提供哪些信息。 +Documentation/ABI/README,了解如何格式化此文档以及需要提供哪些信息。 文件 :ref:`Documentation/admin-guide/kernel-parameters.rst ` 描述了内核的所有引导时间参数。任何添加新参数的补丁都应该向该文件添加适当的 diff --git a/Documentation/x86/x86_64/5level-paging.rst b/Documentation/x86/x86_64/5level-paging.rst index ab88a4514163..44856417e6a5 100644 --- a/Documentation/x86/x86_64/5level-paging.rst +++ b/Documentation/x86/x86_64/5level-paging.rst @@ -20,7 +20,7 @@ physical address space. This "ought to be enough for anybody" ©. QEMU 2.9 and later support 5-level paging. Virtual memory layout for 5-level paging is described in -Documentation/x86/x86_64/mm.txt +Documentation/x86/x86_64/mm.rst Enabling 5-level paging diff --git a/Documentation/x86/x86_64/boot-options.rst b/Documentation/x86/x86_64/boot-options.rst index 2f69836b8445..6a4285a3c7a4 100644 --- a/Documentation/x86/x86_64/boot-options.rst +++ b/Documentation/x86/x86_64/boot-options.rst @@ -9,7 +9,7 @@ only the AMD64 specific ones are listed here. Machine check ============= -Please see Documentation/x86/x86_64/machinecheck for sysfs runtime tunables. +Please see Documentation/x86/x86_64/machinecheck.rst for sysfs runtime tunables. mce=off Disable machine check @@ -89,7 +89,7 @@ APICs Don't use the local APIC (alias for i386 compatibility) pirq=... - See Documentation/x86/i386/IO-APIC.txt + See Documentation/x86/i386/IO-APIC.rst noapictimer Don't set up the APIC timer diff --git a/Documentation/x86/x86_64/fake-numa-for-cpusets.rst b/Documentation/x86/x86_64/fake-numa-for-cpusets.rst index 74fbb78b3c67..04df57b9aa3f 100644 --- a/Documentation/x86/x86_64/fake-numa-for-cpusets.rst +++ b/Documentation/x86/x86_64/fake-numa-for-cpusets.rst @@ -18,7 +18,7 @@ For more information on the features of cpusets, see Documentation/cgroup-v1/cpusets.txt. There are a number of different configurations you can use for your needs. For more information on the numa=fake command line option and its various ways of -configuring fake nodes, see Documentation/x86/x86_64/boot-options.txt. +configuring fake nodes, see Documentation/x86/x86_64/boot-options.rst. For the purposes of this introduction, we'll assume a very primitive NUMA emulation setup of "numa=fake=4*512,". This will split our system memory into diff --git a/MAINTAINERS b/MAINTAINERS index 5cfbea4ce575..26e0369c1641 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -3874,7 +3874,7 @@ F: Documentation/devicetree/bindings/hwmon/cirrus,lochnagar.txt F: Documentation/devicetree/bindings/pinctrl/cirrus,lochnagar.txt F: Documentation/devicetree/bindings/regulator/cirrus,lochnagar.txt F: Documentation/devicetree/bindings/sound/cirrus,lochnagar.txt -F: Documentation/hwmon/lochnagar +F: Documentation/hwmon/lochnagar.rst CISCO FCOE HBA DRIVER M: Satish Kharat @@ -11272,7 +11272,7 @@ NXP FXAS21002C DRIVER M: Rui Miguel Silva L: linux-iio@vger.kernel.org S: Maintained -F: Documentation/devicetree/bindings/iio/gyroscope/fxas21002c.txt +F: Documentation/devicetree/bindings/iio/gyroscope/nxp,fxas21002c.txt F: drivers/iio/gyro/fxas21002c_core.c F: drivers/iio/gyro/fxas21002c.h F: drivers/iio/gyro/fxas21002c_i2c.c diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 8869742a85df..0f220264cc23 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -1263,7 +1263,7 @@ config SMP uniprocessor machines. On a uniprocessor machine, the kernel will run faster if you say N here. - See also , + See also , and the SMP-HOWTO available at . diff --git a/arch/arm64/kernel/kexec_image.c b/arch/arm64/kernel/kexec_image.c index 07bf740bea91..31cc2f423aa8 100644 --- a/arch/arm64/kernel/kexec_image.c +++ b/arch/arm64/kernel/kexec_image.c @@ -53,7 +53,7 @@ static void *image_load(struct kimage *image, /* * We require a kernel with an unambiguous Image header. Per - * Documentation/booting.txt, this is the case when image_size + * Documentation/arm64/booting.txt, this is the case when image_size * is non-zero (practically speaking, since v3.17). */ h = (struct arm64_image_header *)kernel; diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index d87d53fcd261..9f1f7b47621c 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -395,7 +395,7 @@ config SMP Y to "Enhanced Real Time Clock Support", below. The "Advanced Power Management" code will be disabled if you say Y here. - See also , + See also , and the SMP-HOWTO available at . @@ -1290,7 +1290,7 @@ config MICROCODE the Linux kernel. The preferred method to load microcode from a detached initrd is described - in Documentation/x86/microcode.txt. For that you need to enable + in Documentation/x86/microcode.rst. For that you need to enable CONFIG_BLK_DEV_INITRD in order for the loader to be able to scan the initrd for microcode blobs. @@ -1329,7 +1329,7 @@ config MICROCODE_OLD_INTERFACE It is inadequate because it runs too late to be able to properly load microcode on a machine and it needs special tools. Instead, you should've switched to the early loading method with the initrd or - builtin microcode by now: Documentation/x86/microcode.txt + builtin microcode by now: Documentation/x86/microcode.rst config X86_MSR tristate "/dev/cpu/*/msr - Model-specific register support" @@ -1478,7 +1478,7 @@ config X86_5LEVEL A kernel with the option enabled can be booted on machines that support 4- or 5-level paging. - See Documentation/x86/x86_64/5level-paging.txt for more + See Documentation/x86/x86_64/5level-paging.rst for more information. Say N if unsure. @@ -1626,7 +1626,7 @@ config ARCH_MEMORY_PROBE depends on X86_64 && MEMORY_HOTPLUG help This option enables a sysfs memory/probe interface for testing. - See Documentation/memory-hotplug.txt for more information. + See Documentation/admin-guide/mm/memory-hotplug.rst for more information. If you are unsure how to answer this question, answer N. config ARCH_PROC_KCORE_TEXT @@ -1783,7 +1783,7 @@ config MTRR You can safely say Y even if your machine doesn't have MTRRs, you'll just add about 9 KB to your kernel. - See for more information. + See for more information. config MTRR_SANITIZER def_bool y @@ -1895,7 +1895,7 @@ config X86_INTEL_MPX process and adds some branches to paths used during exec() and munmap(). - For details, see Documentation/x86/intel_mpx.txt + For details, see Documentation/x86/intel_mpx.rst If unsure, say N. diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug index f730680dc818..59f598543203 100644 --- a/arch/x86/Kconfig.debug +++ b/arch/x86/Kconfig.debug @@ -156,7 +156,7 @@ config IOMMU_DEBUG code. When you use it make sure you have a big enough IOMMU/AGP aperture. Most of the options enabled by this can be set more finegrained using the iommu= command line - options. See Documentation/x86/x86_64/boot-options.txt for more + options. See Documentation/x86/x86_64/boot-options.rst for more details. config IOMMU_LEAK diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S index 850b8762e889..90d791ca1a95 100644 --- a/arch/x86/boot/header.S +++ b/arch/x86/boot/header.S @@ -313,7 +313,7 @@ start_sys_seg: .word SYSSEG # obsolete and meaningless, but just type_of_loader: .byte 0 # 0 means ancient bootloader, newer # bootloaders know to change this. - # See Documentation/x86/boot.txt for + # See Documentation/x86/boot.rst for # assigned ids # flags, unused bits must be zero (RFU) bit within loadflags diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 11aa3b2afa4d..33f9fc38d014 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -8,7 +8,7 @@ * * entry.S contains the system-call and fault low-level handling routines. * - * Some of this is documented in Documentation/x86/entry_64.txt + * Some of this is documented in Documentation/x86/entry_64.rst * * A note on terminology: * - iret frame: Architecture defined interrupt frame from SS to RIP diff --git a/arch/x86/include/asm/bootparam_utils.h b/arch/x86/include/asm/bootparam_utils.h index f6f6ef436599..101eb944f13c 100644 --- a/arch/x86/include/asm/bootparam_utils.h +++ b/arch/x86/include/asm/bootparam_utils.h @@ -24,7 +24,7 @@ static void sanitize_boot_params(struct boot_params *boot_params) * IMPORTANT NOTE TO BOOTLOADER AUTHORS: do not simply clear * this field. The purpose of this field is to guarantee * compliance with the x86 boot spec located in - * Documentation/x86/boot.txt . That spec says that the + * Documentation/x86/boot.rst . That spec says that the * *whole* structure should be cleared, after which only the * portion defined by struct setup_header (boot_params->hdr) * should be copied in. diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h index 793c14c372cb..288b065955b7 100644 --- a/arch/x86/include/asm/page_64_types.h +++ b/arch/x86/include/asm/page_64_types.h @@ -48,7 +48,7 @@ #define __START_KERNEL_map _AC(0xffffffff80000000, UL) -/* See Documentation/x86/x86_64/mm.txt for a description of the memory map. */ +/* See Documentation/x86/x86_64/mm.rst for a description of the memory map. */ #define __PHYSICAL_MASK_SHIFT 52 diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h index 88bca456da99..52e5f5f2240d 100644 --- a/arch/x86/include/asm/pgtable_64_types.h +++ b/arch/x86/include/asm/pgtable_64_types.h @@ -103,7 +103,7 @@ extern unsigned int ptrs_per_p4d; #define PGDIR_MASK (~(PGDIR_SIZE - 1)) /* - * See Documentation/x86/x86_64/mm.txt for a description of the memory map. + * See Documentation/x86/x86_64/mm.rst for a description of the memory map. * * Be very careful vs. KASLR when changing anything here. The KASLR address * range must not overlap with anything except the KASAN shadow area, which diff --git a/arch/x86/kernel/cpu/microcode/amd.c b/arch/x86/kernel/cpu/microcode/amd.c index e1f3ba19ba54..06d4e67f31ab 100644 --- a/arch/x86/kernel/cpu/microcode/amd.c +++ b/arch/x86/kernel/cpu/microcode/amd.c @@ -61,7 +61,7 @@ static u8 amd_ucode_patch[PATCH_MAX_SIZE]; /* * Microcode patch container file is prepended to the initrd in cpio - * format. See Documentation/x86/microcode.txt + * format. See Documentation/x86/microcode.rst */ static const char ucode_path[] __maybe_unused = "kernel/x86/microcode/AuthenticAMD.bin"; diff --git a/arch/x86/kernel/kexec-bzimage64.c b/arch/x86/kernel/kexec-bzimage64.c index 22f60dd26460..b07e7069b09e 100644 --- a/arch/x86/kernel/kexec-bzimage64.c +++ b/arch/x86/kernel/kexec-bzimage64.c @@ -416,7 +416,7 @@ static void *bzImage64_load(struct kimage *image, char *kernel, efi_map_offset = params_cmdline_sz; efi_setup_data_offset = efi_map_offset + ALIGN(efi_map_sz, 16); - /* Copy setup header onto bootparams. Documentation/x86/boot.txt */ + /* Copy setup header onto bootparams. Documentation/x86/boot.rst */ setup_header_size = 0x0202 + kernel[0x0201] - setup_hdr_offset; /* Is there a limit on setup header size? */ diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c index dcd272dbd0a9..f62b498b18fb 100644 --- a/arch/x86/kernel/pci-dma.c +++ b/arch/x86/kernel/pci-dma.c @@ -70,7 +70,7 @@ void __init pci_iommu_alloc(void) } /* - * See for the iommu kernel + * See for the iommu kernel * parameter documentation. */ static __init int iommu_setup(char *p) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 7f61431c75fb..400c1ba033aa 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -711,7 +711,7 @@ void native_flush_tlb_others(const struct cpumask *cpumask, } /* - * See Documentation/x86/tlb.txt for details. We choose 33 + * See Documentation/x86/tlb.rst for details. We choose 33 * because it is large enough to cover the vast majority (at * least 95%) of allocations, and is small enough that we are * confident it will not cause too much overhead. Each single diff --git a/arch/x86/platform/pvh/enlighten.c b/arch/x86/platform/pvh/enlighten.c index 1861a2ba0f2b..c0a502f7e3a7 100644 --- a/arch/x86/platform/pvh/enlighten.c +++ b/arch/x86/platform/pvh/enlighten.c @@ -86,7 +86,7 @@ static void __init init_pvh_bootparams(bool xen_guest) } /* - * See Documentation/x86/boot.txt. + * See Documentation/x86/boot.rst. * * Version 2.12 supports Xen entry point but we will use default x86/PC * environment (i.e. hardware_subarch 0). diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig index 283ee94224c6..2438f37f2ca1 100644 --- a/drivers/acpi/Kconfig +++ b/drivers/acpi/Kconfig @@ -333,7 +333,7 @@ config ACPI_CUSTOM_DSDT_FILE depends on !STANDALONE help This option supports a custom DSDT by linking it into the kernel. - See Documentation/acpi/dsdt-override.txt + See Documentation/admin-guide/acpi/dsdt-override.rst Enter the full path name to the file which includes the AmlCode or dsdt_aml_code declaration. @@ -355,7 +355,7 @@ config ACPI_TABLE_UPGRADE This option provides functionality to upgrade arbitrary ACPI tables via initrd. No functional change if no ACPI tables are passed via initrd, therefore it's safe to say Y. - See Documentation/acpi/initrd_table_override.txt for details + See Documentation/admin-guide/acpi/initrd_table_override.rst for details config ACPI_TABLE_OVERRIDE_VIA_BUILTIN_INITRD bool "Override ACPI tables from built-in initrd" @@ -365,7 +365,7 @@ config ACPI_TABLE_OVERRIDE_VIA_BUILTIN_INITRD This option provides functionality to override arbitrary ACPI tables from built-in uncompressed initrd. - See Documentation/acpi/initrd_table_override.txt for details + See Documentation/admin-guide/acpi/initrd_table_override.rst for details config ACPI_DEBUG bool "Debug Statements" @@ -374,7 +374,7 @@ config ACPI_DEBUG output and increases the kernel size by around 50K. Use the acpi.debug_layer and acpi.debug_level kernel command-line - parameters documented in Documentation/acpi/debug.txt and + parameters documented in Documentation/firmware-guide/acpi/debug.rst and Documentation/admin-guide/kernel-parameters.rst to control the type and amount of debug output. @@ -445,7 +445,7 @@ config ACPI_CUSTOM_METHOD help This debug facility allows ACPI AML methods to be inserted and/or replaced without rebooting the system. For details refer to: - Documentation/acpi/method-customizing.txt. + Documentation/firmware-guide/acpi/method-customizing.rst. NOTE: This option is security sensitive, because it allows arbitrary kernel memory to be written to by root (uid=0) users, allowing them diff --git a/drivers/net/ethernet/faraday/ftgmac100.c b/drivers/net/ethernet/faraday/ftgmac100.c index b17b79e612a3..ac6280ad43a1 100644 --- a/drivers/net/ethernet/faraday/ftgmac100.c +++ b/drivers/net/ethernet/faraday/ftgmac100.c @@ -1075,7 +1075,7 @@ static int ftgmac100_mii_probe(struct ftgmac100 *priv, phy_interface_t intf) } /* Indicate that we support PAUSE frames (see comment in - * Documentation/networking/phy.txt) + * Documentation/networking/phy.rst) */ phy_support_asym_pause(phydev); diff --git a/drivers/staging/fieldbus/Documentation/fieldbus_dev.txt b/drivers/staging/fieldbus/Documentation/fieldbus_dev.txt index 56af3f650fa3..89fb8e14676f 100644 --- a/drivers/staging/fieldbus/Documentation/fieldbus_dev.txt +++ b/drivers/staging/fieldbus/Documentation/fieldbus_dev.txt @@ -54,8 +54,8 @@ a limited few common behaviours and properties. This allows us to define a simple interface consisting of a character device and a set of sysfs files: See: -Documentation/ABI/testing/sysfs-class-fieldbus-dev -Documentation/ABI/testing/fieldbus-dev-cdev +drivers/staging/fieldbus/Documentation/ABI/sysfs-class-fieldbus-dev +drivers/staging/fieldbus/Documentation/ABI/fieldbus-dev-cdev Note that this simple interface does not provide a way to modify adapter configuration settings. It is therefore useful only for adapters that get their diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 1e3ed41ae1f3..69938dbae2d0 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -1694,7 +1694,7 @@ EXPORT_SYMBOL_GPL(vhost_dev_ioctl); /* TODO: This is really inefficient. We need something like get_user() * (instruction directly accesses the data, with an exception table entry - * returning -EFAULT). See Documentation/x86/exception-tables.txt. + * returning -EFAULT). See Documentation/x86/exception-tables.rst. */ static int set_bit_to_user(int nr, void __user *addr) { diff --git a/include/acpi/acpi_drivers.h b/include/acpi/acpi_drivers.h index de1804aeaf69..98e3db7a89cd 100644 --- a/include/acpi/acpi_drivers.h +++ b/include/acpi/acpi_drivers.h @@ -25,7 +25,7 @@ #define ACPI_MAX_STRING 80 /* - * Please update drivers/acpi/debug.c and Documentation/acpi/debug.txt + * Please update drivers/acpi/debug.c and Documentation/firmware-guide/acpi/debug.rst * if you add to this list. */ #define ACPI_BUS_COMPONENT 0x00010000 diff --git a/include/linux/fs_context.h b/include/linux/fs_context.h index 1f966670c8dc..623eb58560b9 100644 --- a/include/linux/fs_context.h +++ b/include/linux/fs_context.h @@ -85,7 +85,7 @@ struct fs_parameter { * Superblock creation fills in ->root whereas reconfiguration begins with this * already set. * - * See Documentation/filesystems/mounting.txt + * See Documentation/filesystems/mount_api.txt */ struct fs_context { const struct fs_context_operations *ops; diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index 47f58cfb6a19..df1318d85f7d 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -77,7 +77,7 @@ * state. This is called immediately after commit_creds(). * * Security hooks for mount using fs_context. - * [See also Documentation/filesystems/mounting.txt] + * [See also Documentation/filesystems/mount_api.txt] * * @fs_context_dup: * Allocate and attach a security structure to sc->security. This pointer diff --git a/mm/Kconfig b/mm/Kconfig index ee8d1f311858..6e5fb81bde4b 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -165,7 +165,7 @@ config MEMORY_HOTPLUG_DEFAULT_ONLINE onlining policy (/sys/devices/system/memory/auto_online_blocks) which determines what happens to newly added memory regions. Policy setting can always be changed at runtime. - See Documentation/memory-hotplug.txt for more information. + See Documentation/admin-guide/mm/memory-hotplug.rst for more information. Say Y here if you want all hot-plugged memory blocks to appear in 'online' state by default. diff --git a/security/Kconfig b/security/Kconfig index aeac3676dd4d..6d75ed71970c 100644 --- a/security/Kconfig +++ b/security/Kconfig @@ -62,7 +62,7 @@ config PAGE_TABLE_ISOLATION ensuring that the majority of kernel addresses are not mapped into userspace. - See Documentation/x86/pti.txt for more details. + See Documentation/x86/pti.rst for more details. config SECURITY_INFINIBAND bool "Infiniband Security Hooks" diff --git a/tools/include/linux/err.h b/tools/include/linux/err.h index 2f5a12b88a86..25f2bb3a991d 100644 --- a/tools/include/linux/err.h +++ b/tools/include/linux/err.h @@ -20,7 +20,7 @@ * Userspace note: * The same principle works for userspace, because 'error' pointers * fall down to the unused hole far from user space, as described - * in Documentation/x86/x86_64/mm.txt for x86_64 arch: + * in Documentation/x86/x86_64/mm.rst for x86_64 arch: * * 0000000000000000 - 00007fffffffffff (=47 bits) user space, different per mm hole caused by [48:63] sign extension * ffffffffffe00000 - ffffffffffffffff (=2 MB) unused hole diff --git a/tools/objtool/Documentation/stack-validation.txt b/tools/objtool/Documentation/stack-validation.txt index 4dd11a554b9b..de094670050b 100644 --- a/tools/objtool/Documentation/stack-validation.txt +++ b/tools/objtool/Documentation/stack-validation.txt @@ -21,7 +21,7 @@ instructions). Similarly, it knows how to follow switch statements, for which gcc sometimes uses jump tables. (Objtool also has an 'orc generate' subcommand which generates debuginfo -for the ORC unwinder. See Documentation/x86/orc-unwinder.txt in the +for the ORC unwinder. See Documentation/x86/orc-unwinder.rst in the kernel tree for more details.) @@ -101,7 +101,7 @@ b) ORC (Oops Rewind Capability) unwind table generation band. So it doesn't affect runtime performance and it can be reliable even when interrupts or exceptions are involved. - For more details, see Documentation/x86/orc-unwinder.txt. + For more details, see Documentation/x86/orc-unwinder.rst. c) Higher live patching compatibility rate From 9915ec28ec7fc79f0f30ebbba5d19bfa17eb7f03 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Fri, 7 Jun 2019 15:54:34 -0300 Subject: [PATCH 064/129] docs: isdn: remove hisax references from kernel-parameters.txt The hisax driver got removed on 85993b8c9786 ("isdn: remove hisax driver"), but a left-over was kept at kernel-parameters.txt. Fixes: 85993b8c9786 ("isdn: remove hisax driver") Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Jonathan Corbet --- Documentation/admin-guide/kernel-parameters.txt | 3 --- 1 file changed, 3 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 1abd7e145357..9b16b640ce48 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -1388,9 +1388,6 @@ Valid parameters: "on", "off" Default: "on" - hisax= [HW,ISDN] - See Documentation/isdn/README.HiSax. - hlt [BUGS=ARM,SH] hpet= [X86-32,HPET] option to control HPET usage From 5c437fa29561f5809ef114ba3a5e80556cc43fb3 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Fri, 7 Jun 2019 15:54:35 -0300 Subject: [PATCH 065/129] docs: fs: fix broken links to vfs.txt with was renamed to vfs.rst A recent documentation conversion renamed this file but forgot to update the links. Fixes: af96c1e304f7 ("docs: filesystems: vfs: Convert vfs.txt to RST") Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Jonathan Corbet --- Documentation/filesystems/porting | 10 +++++----- include/linux/dcache.h | 4 ++-- include/linux/fs.h | 2 +- 3 files changed, 8 insertions(+), 8 deletions(-) diff --git a/Documentation/filesystems/porting b/Documentation/filesystems/porting index 3bd1148d8bb6..2813a19389fe 100644 --- a/Documentation/filesystems/porting +++ b/Documentation/filesystems/porting @@ -330,14 +330,14 @@ unreferenced dentries, and is now only called when the dentry refcount goes to [mandatory] .d_compare() calling convention and locking rules are significantly -changed. Read updated documentation in Documentation/filesystems/vfs.txt (and +changed. Read updated documentation in Documentation/filesystems/vfs.rst (and look at examples of other filesystems) for guidance. --- [mandatory] .d_hash() calling convention and locking rules are significantly -changed. Read updated documentation in Documentation/filesystems/vfs.txt (and +changed. Read updated documentation in Documentation/filesystems/vfs.rst (and look at examples of other filesystems) for guidance. --- @@ -377,12 +377,12 @@ where possible. the filesystem provides it), which requires dropping out of rcu-walk mode. This may now be called in rcu-walk mode (nd->flags & LOOKUP_RCU). -ECHILD should be returned if the filesystem cannot handle rcu-walk. See -Documentation/filesystems/vfs.txt for more details. +Documentation/filesystems/vfs.rst for more details. permission is an inode permission check that is called on many or all directory inodes on the way down a path walk (to check for exec permission). It must now be rcu-walk aware (mask & MAY_NOT_BLOCK). See -Documentation/filesystems/vfs.txt for more details. +Documentation/filesystems/vfs.rst for more details. -- [mandatory] @@ -625,7 +625,7 @@ in your dentry operations instead. -- [mandatory] ->clone_file_range() and ->dedupe_file_range have been replaced with - ->remap_file_range(). See Documentation/filesystems/vfs.txt for more + ->remap_file_range(). See Documentation/filesystems/vfs.rst for more information. -- [recommended] diff --git a/include/linux/dcache.h b/include/linux/dcache.h index f14e587c5d5d..5e0eadf7de55 100644 --- a/include/linux/dcache.h +++ b/include/linux/dcache.h @@ -153,7 +153,7 @@ struct dentry_operations { * Locking rules for dentry_operations callbacks are to be found in * Documentation/filesystems/Locking. Keep it updated! * - * FUrther descriptions are found in Documentation/filesystems/vfs.txt. + * FUrther descriptions are found in Documentation/filesystems/vfs.rst. * Keep it updated too! */ @@ -568,7 +568,7 @@ static inline struct dentry *d_backing_dentry(struct dentry *upper) * If dentry is on a union/overlay, then return the underlying, real dentry. * Otherwise return the dentry itself. * - * See also: Documentation/filesystems/vfs.txt + * See also: Documentation/filesystems/vfs.rst */ static inline struct dentry *d_real(struct dentry *dentry, const struct inode *inode) diff --git a/include/linux/fs.h b/include/linux/fs.h index f7fdfe93e25d..c564cf3f48d9 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1769,7 +1769,7 @@ struct block_device_operations; /* * These flags control the behavior of the remap_file_range function pointer. * If it is called with len == 0 that means "remap to end of source file". - * See Documentation/filesystems/vfs.txt for more details about this call. + * See Documentation/filesystems/vfs.rst for more details about this call. * * REMAP_FILE_DEDUP: only remap if contents identical (i.e. deduplicate) * REMAP_FILE_CAN_SHORTEN: caller can handle a shortened request From b640fbad2d8fe120c761f61eb6c96f05047100cd Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Fri, 7 Jun 2019 15:54:36 -0300 Subject: [PATCH 066/129] docs: pci: fix broken links due to conversion from pci.txt to pci.rst Some documentation files were still pointing to the old place. Fixes: 229b4e0728e0 ("Documentation: PCI: convert pci.txt to reST") Signed-off-by: Mauro Carvalho Chehab Acked-by: Paul E. McKenney Signed-off-by: Jonathan Corbet --- Documentation/memory-barriers.txt | 2 +- Documentation/translations/ko_KR/memory-barriers.txt | 2 +- drivers/scsi/hpsa.c | 4 ++-- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index f70ebcdfe592..f4170aae1d75 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt @@ -548,7 +548,7 @@ There are certain things that the Linux kernel memory barriers do not guarantee: [*] For information on bus mastering DMA and coherency please read: - Documentation/PCI/pci.txt + Documentation/PCI/pci.rst Documentation/DMA-API-HOWTO.txt Documentation/DMA-API.txt diff --git a/Documentation/translations/ko_KR/memory-barriers.txt b/Documentation/translations/ko_KR/memory-barriers.txt index db0b9d8619f1..07725b1df002 100644 --- a/Documentation/translations/ko_KR/memory-barriers.txt +++ b/Documentation/translations/ko_KR/memory-barriers.txt @@ -569,7 +569,7 @@ ACQUIRE 는 해당 오퍼레이션의 로드 부분에만 적용되고 RELEASE [*] 버스 마스터링 DMA 와 일관성에 대해서는 다음을 참고하시기 바랍니다: - Documentation/PCI/pci.txt + Documentation/PCI/pci.rst Documentation/DMA-API-HOWTO.txt Documentation/DMA-API.txt diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c index 1bef1da273c2..53df6f7dd3f9 100644 --- a/drivers/scsi/hpsa.c +++ b/drivers/scsi/hpsa.c @@ -7760,7 +7760,7 @@ static void hpsa_free_pci_init(struct ctlr_info *h) hpsa_disable_interrupt_mode(h); /* pci_init 2 */ /* * call pci_disable_device before pci_release_regions per - * Documentation/PCI/pci.txt + * Documentation/PCI/pci.rst */ pci_disable_device(h->pdev); /* pci_init 1 */ pci_release_regions(h->pdev); /* pci_init 2 */ @@ -7843,7 +7843,7 @@ clean2: /* intmode+region, pci */ clean1: /* * call pci_disable_device before pci_release_regions per - * Documentation/PCI/pci.txt + * Documentation/PCI/pci.rst */ pci_disable_device(h->pdev); pci_release_regions(h->pdev); From ce1a5ea18ef9bf4c62c75abe7c540a29264ec988 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Fri, 14 Jun 2019 09:02:49 +0200 Subject: [PATCH 067/129] Documentation: Remove duplicate x86 index entry x86 got added twice to the index via the RST conversion and the MDS documentation changes. Remove one instance. Signed-off-by: Thomas Gleixner Signed-off-by: Jonathan Corbet --- Documentation/index.rst | 1 - 1 file changed, 1 deletion(-) diff --git a/Documentation/index.rst b/Documentation/index.rst index a7566ef62411..781042b4579d 100644 --- a/Documentation/index.rst +++ b/Documentation/index.rst @@ -112,7 +112,6 @@ implementation. .. toctree:: :maxdepth: 2 - x86/index sh/index x86/index From 305a99eb98af22996e9771078b7a19978732ed41 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Wed, 12 Jun 2019 14:52:37 -0300 Subject: [PATCH 068/129] docs: aoe: convert docs to ReST and rename to *.rst There are only two files within Documentation/aoe dir that are documentation. The remaining ones are examples and shell scripts. Convert the two AoE files to ReST format, and add the others as literal, as they're part of the documentation. At its new index.rst, let's add a :orphan: while this is not linked to the main index.rst file, in order to avoid build warnings. Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Jonathan Corbet --- Documentation/aoe/{aoe.txt => aoe.rst} | 63 +++++++++++++----------- Documentation/aoe/examples.rst | 23 +++++++++ Documentation/aoe/index.rst | 19 +++++++ Documentation/aoe/{todo.txt => todo.rst} | 3 ++ Documentation/aoe/udev.txt | 2 +- 5 files changed, 81 insertions(+), 29 deletions(-) rename Documentation/aoe/{aoe.txt => aoe.rst} (79%) create mode 100644 Documentation/aoe/examples.rst create mode 100644 Documentation/aoe/index.rst rename Documentation/aoe/{todo.txt => todo.rst} (98%) diff --git a/Documentation/aoe/aoe.txt b/Documentation/aoe/aoe.rst similarity index 79% rename from Documentation/aoe/aoe.txt rename to Documentation/aoe/aoe.rst index c71487d399d1..58747ecec71d 100644 --- a/Documentation/aoe/aoe.txt +++ b/Documentation/aoe/aoe.rst @@ -1,3 +1,6 @@ +Introduction +============ + ATA over Ethernet is a network protocol that provides simple access to block storage on the LAN. @@ -22,7 +25,8 @@ document the use of the driver and are not necessary if you install the aoetools. -CREATING DEVICE NODES +Creating Device Nodes +===================== Users of udev should find the block device nodes created automatically, but to create all the necessary device nodes, use the @@ -38,7 +42,8 @@ CREATING DEVICE NODES confusing when an AoE device is not present the first time the a command is run but appears a second later. -USING DEVICE NODES +Using Device Nodes +================== "cat /dev/etherd/err" blocks, waiting for error diagnostic output, like any retransmitted packets. @@ -55,7 +60,7 @@ USING DEVICE NODES by sysfs counterparts. Using the commands in aoetools insulates users from these implementation details. - The block devices are named like this: + The block devices are named like this:: e{shelf}.{slot} e{shelf}.{slot}p{part} @@ -64,7 +69,8 @@ USING DEVICE NODES first shelf (shelf address zero). That's the whole disk. The first partition on that disk would be "e0.2p1". -USING SYSFS +Using sysfs +=========== Each aoe block device in /sys/block has the extra attributes of state, mac, and netif. The state attribute is "up" when the device @@ -78,29 +84,29 @@ USING SYSFS There is a script in this directory that formats this information in a convenient way. Users with aoetools should use the aoe-stat - command. + command:: - root@makki root# sh Documentation/aoe/status.sh - e10.0 eth3 up - e10.1 eth3 up - e10.2 eth3 up - e10.3 eth3 up - e10.4 eth3 up - e10.5 eth3 up - e10.6 eth3 up - e10.7 eth3 up - e10.8 eth3 up - e10.9 eth3 up - e4.0 eth1 up - e4.1 eth1 up - e4.2 eth1 up - e4.3 eth1 up - e4.4 eth1 up - e4.5 eth1 up - e4.6 eth1 up - e4.7 eth1 up - e4.8 eth1 up - e4.9 eth1 up + root@makki root# sh Documentation/aoe/status.sh + e10.0 eth3 up + e10.1 eth3 up + e10.2 eth3 up + e10.3 eth3 up + e10.4 eth3 up + e10.5 eth3 up + e10.6 eth3 up + e10.7 eth3 up + e10.8 eth3 up + e10.9 eth3 up + e4.0 eth1 up + e4.1 eth1 up + e4.2 eth1 up + e4.3 eth1 up + e4.4 eth1 up + e4.5 eth1 up + e4.6 eth1 up + e4.7 eth1 up + e4.8 eth1 up + e4.9 eth1 up Use /sys/module/aoe/parameters/aoe_iflist (or better, the driver option discussed below) instead of /dev/etherd/interfaces to limit @@ -113,12 +119,13 @@ USING SYSFS for this purpose. You can also directly use the /dev/etherd/discover special file described above. -DRIVER OPTIONS +Driver Options +============== There is a boot option for the built-in aoe driver and a corresponding module parameter, aoe_iflist. Without this option, all network interfaces may be used for ATA over Ethernet. Here is a - usage example for the module parameter. + usage example for the module parameter:: modprobe aoe_iflist="eth1 eth3" diff --git a/Documentation/aoe/examples.rst b/Documentation/aoe/examples.rst new file mode 100644 index 000000000000..91f3198e52c1 --- /dev/null +++ b/Documentation/aoe/examples.rst @@ -0,0 +1,23 @@ +Example of udev rules +--------------------- + + .. include:: udev.txt + :literal: + +Example of udev install rules script +------------------------------------ + + .. literalinclude:: udev-install.sh + :language: shell + +Example script to get status +---------------------------- + + .. literalinclude:: status.sh + :language: shell + +Example of AoE autoload script +------------------------------ + + .. literalinclude:: autoload.sh + :language: shell diff --git a/Documentation/aoe/index.rst b/Documentation/aoe/index.rst new file mode 100644 index 000000000000..4394b9b7913c --- /dev/null +++ b/Documentation/aoe/index.rst @@ -0,0 +1,19 @@ +:orphan: + +======================= +ATA over Ethernet (AoE) +======================= + +.. toctree:: + :maxdepth: 1 + + aoe + todo + examples + +.. only:: subproject and html + + Indices + ======= + + * :ref:`genindex` diff --git a/Documentation/aoe/todo.txt b/Documentation/aoe/todo.rst similarity index 98% rename from Documentation/aoe/todo.txt rename to Documentation/aoe/todo.rst index c09dfad4aed8..dea8db5a33e1 100644 --- a/Documentation/aoe/todo.txt +++ b/Documentation/aoe/todo.rst @@ -1,3 +1,6 @@ +TODO +==== + There is a potential for deadlock when allocating a struct sk_buff for data that needs to be written out to aoe storage. If the data is being written from a dirty page in order to free that page, and if diff --git a/Documentation/aoe/udev.txt b/Documentation/aoe/udev.txt index 1f06daf03f5b..54feda5a0772 100644 --- a/Documentation/aoe/udev.txt +++ b/Documentation/aoe/udev.txt @@ -11,7 +11,7 @@ # udev_rules="/etc/udev/rules.d/" # bash# ls /etc/udev/rules.d/ # 10-wacom.rules 50-udev.rules -# bash# cp /path/to/linux-2.6.xx/Documentation/aoe/udev.txt \ +# bash# cp /path/to/linux/Documentation/aoe/udev.txt \ # /etc/udev/rules.d/60-aoe.rules # From b693d0b372afb39432e1c49ad7b3454855bc6bed Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Wed, 12 Jun 2019 14:52:38 -0300 Subject: [PATCH 069/129] docs: arm64: convert docs to ReST and rename to .rst The documentation is in a format that is very close to ReST format. The conversion is actually: - add blank lines in order to identify paragraphs; - fixing tables markups; - adding some lists markups; - marking literal blocks; - adjust some title markups. At its new index.rst, let's add a :orphan: while this is not linked to the main index.rst file, in order to avoid build warnings. Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Jonathan Corbet --- ...object_usage.txt => acpi_object_usage.rst} | 288 ++++++++++++------ .../arm64/{arm-acpi.txt => arm-acpi.rst} | 155 +++++----- .../arm64/{booting.txt => booting.rst} | 91 ++++-- ...egisters.txt => cpu-feature-registers.rst} | 204 +++++++------ .../arm64/{elf_hwcaps.txt => elf_hwcaps.rst} | 56 +--- .../{hugetlbpage.txt => hugetlbpage.rst} | 7 +- Documentation/arm64/index.rst | 28 ++ ...structions.txt => legacy_instructions.rst} | 43 ++- Documentation/arm64/memory.rst | 98 ++++++ Documentation/arm64/memory.txt | 97 ------ ...ication.txt => pointer-authentication.rst} | 2 + ...{silicon-errata.txt => silicon-errata.rst} | 65 +++- Documentation/arm64/{sve.txt => sve.rst} | 12 +- ...agged-pointers.txt => tagged-pointers.rst} | 6 +- .../translations/zh_CN/arm64/booting.txt | 4 +- .../zh_CN/arm64/legacy_instructions.txt | 4 +- .../translations/zh_CN/arm64/memory.txt | 4 +- .../zh_CN/arm64/silicon-errata.txt | 4 +- .../zh_CN/arm64/tagged-pointers.txt | 4 +- Documentation/virtual/kvm/api.txt | 2 +- arch/arm64/include/asm/efi.h | 2 +- arch/arm64/include/asm/image.h | 2 +- arch/arm64/include/uapi/asm/sigcontext.h | 2 +- arch/arm64/kernel/kexec_image.c | 2 +- 24 files changed, 703 insertions(+), 479 deletions(-) rename Documentation/arm64/{acpi_object_usage.txt => acpi_object_usage.rst} (84%) rename Documentation/arm64/{arm-acpi.txt => arm-acpi.rst} (86%) rename Documentation/arm64/{booting.txt => booting.rst} (86%) rename Documentation/arm64/{cpu-feature-registers.txt => cpu-feature-registers.rst} (65%) rename Documentation/arm64/{elf_hwcaps.txt => elf_hwcaps.rst} (92%) rename Documentation/arm64/{hugetlbpage.txt => hugetlbpage.rst} (86%) create mode 100644 Documentation/arm64/index.rst rename Documentation/arm64/{legacy_instructions.txt => legacy_instructions.rst} (73%) create mode 100644 Documentation/arm64/memory.rst delete mode 100644 Documentation/arm64/memory.txt rename Documentation/arm64/{pointer-authentication.txt => pointer-authentication.rst} (99%) rename Documentation/arm64/{silicon-errata.txt => silicon-errata.rst} (55%) rename Documentation/arm64/{sve.txt => sve.rst} (98%) rename Documentation/arm64/{tagged-pointers.txt => tagged-pointers.rst} (94%) diff --git a/Documentation/arm64/acpi_object_usage.txt b/Documentation/arm64/acpi_object_usage.rst similarity index 84% rename from Documentation/arm64/acpi_object_usage.txt rename to Documentation/arm64/acpi_object_usage.rst index c77010c5c1f0..d51b69dc624d 100644 --- a/Documentation/arm64/acpi_object_usage.txt +++ b/Documentation/arm64/acpi_object_usage.rst @@ -1,5 +1,7 @@ +=========== ACPI Tables ------------ +=========== + The expectations of individual ACPI tables are discussed in the list that follows. @@ -11,54 +13,71 @@ outside of the UEFI Forum (see Section 5.2.6 of the specification). For ACPI on arm64, tables also fall into the following categories: - -- Required: DSDT, FADT, GTDT, MADT, MCFG, RSDP, SPCR, XSDT + - Required: DSDT, FADT, GTDT, MADT, MCFG, RSDP, SPCR, XSDT - -- Recommended: BERT, EINJ, ERST, HEST, PCCT, SSDT + - Recommended: BERT, EINJ, ERST, HEST, PCCT, SSDT - -- Optional: BGRT, CPEP, CSRT, DBG2, DRTM, ECDT, FACS, FPDT, IORT, + - Optional: BGRT, CPEP, CSRT, DBG2, DRTM, ECDT, FACS, FPDT, IORT, MCHI, MPST, MSCT, NFIT, PMTT, RASF, SBST, SLIT, SPMI, SRAT, STAO, TCPA, TPM2, UEFI, XENV - -- Not supported: BOOT, DBGP, DMAR, ETDT, HPET, IBFT, IVRS, LPIT, + - Not supported: BOOT, DBGP, DMAR, ETDT, HPET, IBFT, IVRS, LPIT, MSDM, OEMx, PSDT, RSDT, SLIC, WAET, WDAT, WDRT, WPBT +====== ======================================================================== Table Usage for ARMv8 Linux ------ ---------------------------------------------------------------- +====== ======================================================================== BERT Section 18.3 (signature == "BERT") - == Boot Error Record Table == + + **Boot Error Record Table** + Must be supplied if RAS support is provided by the platform. It is recommended this table be supplied. BOOT Signature Reserved (signature == "BOOT") - == simple BOOT flag table == + + **simple BOOT flag table** + Microsoft only table, will not be supported. BGRT Section 5.2.22 (signature == "BGRT") - == Boot Graphics Resource Table == + + **Boot Graphics Resource Table** + Optional, not currently supported, with no real use-case for an ARM server. CPEP Section 5.2.18 (signature == "CPEP") - == Corrected Platform Error Polling table == + + **Corrected Platform Error Polling table** + Optional, not currently supported, and not recommended until such time as ARM-compatible hardware is available, and the specification suitably modified. CSRT Signature Reserved (signature == "CSRT") - == Core System Resources Table == + + **Core System Resources Table** + Optional, not currently supported. DBG2 Signature Reserved (signature == "DBG2") - == DeBuG port table 2 == + + **DeBuG port table 2** + License has changed and should be usable. Optional if used instead of earlycon= on the command line. DBGP Signature Reserved (signature == "DBGP") - == DeBuG Port table == + + **DeBuG Port table** + Microsoft only table, will not be supported. DSDT Section 5.2.11.1 (signature == "DSDT") - == Differentiated System Description Table == + + **Differentiated System Description Table** + A DSDT is required; see also SSDT. ACPI tables contain only one DSDT but can contain one or more SSDTs, @@ -66,22 +85,30 @@ DSDT Section 5.2.11.1 (signature == "DSDT") but cannot modify or replace anything in the DSDT. DMAR Signature Reserved (signature == "DMAR") - == DMA Remapping table == + + **DMA Remapping table** + x86 only table, will not be supported. DRTM Signature Reserved (signature == "DRTM") - == Dynamic Root of Trust for Measurement table == + + **Dynamic Root of Trust for Measurement table** + Optional, not currently supported. ECDT Section 5.2.16 (signature == "ECDT") - == Embedded Controller Description Table == + + **Embedded Controller Description Table** + Optional, not currently supported, but could be used on ARM if and only if one uses the GPE_BIT field to represent an IRQ number, since there are no GPE blocks defined in hardware reduced mode. This would need to be modified in the ACPI specification. EINJ Section 18.6 (signature == "EINJ") - == Error Injection table == + + **Error Injection table** + This table is very useful for testing platform response to error conditions; it allows one to inject an error into the system as if it had actually occurred. However, this table should not be @@ -89,27 +116,35 @@ EINJ Section 18.6 (signature == "EINJ") and executed with the ACPICA tools only during testing. ERST Section 18.5 (signature == "ERST") - == Error Record Serialization Table == + + **Error Record Serialization Table** + On a platform supports RAS, this table must be supplied if it is not UEFI-based; if it is UEFI-based, this table may be supplied. When this table is not present, UEFI run time service will be utilized to save and retrieve hardware error information to and from a persistent store. ETDT Signature Reserved (signature == "ETDT") - == Event Timer Description Table == + + **Event Timer Description Table** + Obsolete table, will not be supported. FACS Section 5.2.10 (signature == "FACS") - == Firmware ACPI Control Structure == + + **Firmware ACPI Control Structure** + It is unlikely that this table will be terribly useful. If it is provided, the Global Lock will NOT be used since it is not part of the hardware reduced profile, and only 64-bit address fields will be considered valid. FADT Section 5.2.9 (signature == "FACP") - == Fixed ACPI Description Table == + + **Fixed ACPI Description Table** Required for arm64. + The HW_REDUCED_ACPI flag must be set. All of the fields that are to be ignored when HW_REDUCED_ACPI is set are expected to be set to zero. @@ -118,22 +153,28 @@ FADT Section 5.2.9 (signature == "FACP") used, not FIRMWARE_CTRL. If PSCI is used (as is recommended), make sure that ARM_BOOT_ARCH is - filled in properly -- that the PSCI_COMPLIANT flag is set and that + filled in properly - that the PSCI_COMPLIANT flag is set and that PSCI_USE_HVC is set or unset as needed (see table 5-37). For the DSDT that is also required, the X_DSDT field is to be used, not the DSDT field. FPDT Section 5.2.23 (signature == "FPDT") - == Firmware Performance Data Table == + + **Firmware Performance Data Table** + Optional, not currently supported. GTDT Section 5.2.24 (signature == "GTDT") - == Generic Timer Description Table == + + **Generic Timer Description Table** + Required for arm64. HEST Section 18.3.2 (signature == "HEST") - == Hardware Error Source Table == + + **Hardware Error Source Table** + ARM-specific error sources have been defined; please use those or the PCI types such as type 6 (AER Root Port), 7 (AER Endpoint), or 8 (AER Bridge), or use type 9 (Generic Hardware Error Source). Firmware first @@ -144,122 +185,174 @@ HEST Section 18.3.2 (signature == "HEST") is recommended this table be supplied. HPET Signature Reserved (signature == "HPET") - == High Precision Event timer Table == + + **High Precision Event timer Table** + x86 only table, will not be supported. IBFT Signature Reserved (signature == "IBFT") - == iSCSI Boot Firmware Table == + + **iSCSI Boot Firmware Table** + Microsoft defined table, support TBD. IORT Signature Reserved (signature == "IORT") - == Input Output Remapping Table == + + **Input Output Remapping Table** + arm64 only table, required in order to describe IO topology, SMMUs, and GIC ITSs, and how those various components are connected together, such as identifying which components are behind which SMMUs/ITSs. This table will only be required on certain SBSA platforms (e.g., - when using GICv3-ITS and an SMMU); on SBSA Level 0 platforms, it + when using GICv3-ITS and an SMMU); on SBSA Level 0 platforms, it remains optional. IVRS Signature Reserved (signature == "IVRS") - == I/O Virtualization Reporting Structure == + + **I/O Virtualization Reporting Structure** + x86_64 (AMD) only table, will not be supported. LPIT Signature Reserved (signature == "LPIT") - == Low Power Idle Table == + + **Low Power Idle Table** + x86 only table as of ACPI 5.1; starting with ACPI 6.0, processor descriptions and power states on ARM platforms should use the DSDT and define processor container devices (_HID ACPI0010, Section 8.4, and more specifically 8.4.3 and and 8.4.4). MADT Section 5.2.12 (signature == "APIC") - == Multiple APIC Description Table == + + **Multiple APIC Description Table** + Required for arm64. Only the GIC interrupt controller structures should be used (types 0xA - 0xF). MCFG Signature Reserved (signature == "MCFG") - == Memory-mapped ConFiGuration space == + + **Memory-mapped ConFiGuration space** + If the platform supports PCI/PCIe, an MCFG table is required. MCHI Signature Reserved (signature == "MCHI") - == Management Controller Host Interface table == + + **Management Controller Host Interface table** + Optional, not currently supported. MPST Section 5.2.21 (signature == "MPST") - == Memory Power State Table == + + **Memory Power State Table** + Optional, not currently supported. MSCT Section 5.2.19 (signature == "MSCT") - == Maximum System Characteristic Table == + + **Maximum System Characteristic Table** + Optional, not currently supported. MSDM Signature Reserved (signature == "MSDM") - == Microsoft Data Management table == + + **Microsoft Data Management table** + Microsoft only table, will not be supported. NFIT Section 5.2.25 (signature == "NFIT") - == NVDIMM Firmware Interface Table == + + **NVDIMM Firmware Interface Table** + Optional, not currently supported. OEMx Signature of "OEMx" only - == OEM Specific Tables == + + **OEM Specific Tables** + All tables starting with a signature of "OEM" are reserved for OEM use. Since these are not meant to be of general use but are limited to very specific end users, they are not recommended for use and are not supported by the kernel for arm64. PCCT Section 14.1 (signature == "PCCT) - == Platform Communications Channel Table == + + **Platform Communications Channel Table** + Recommend for use on arm64; use of PCC is recommended when using CPPC to control performance and power for platform processors. PMTT Section 5.2.21.12 (signature == "PMTT") - == Platform Memory Topology Table == + + **Platform Memory Topology Table** + Optional, not currently supported. PSDT Section 5.2.11.3 (signature == "PSDT") - == Persistent System Description Table == + + **Persistent System Description Table** + Obsolete table, will not be supported. RASF Section 5.2.20 (signature == "RASF") - == RAS Feature table == + + **RAS Feature table** + Optional, not currently supported. RSDP Section 5.2.5 (signature == "RSD PTR") - == Root System Description PoinTeR == + + **Root System Description PoinTeR** + Required for arm64. RSDT Section 5.2.7 (signature == "RSDT") - == Root System Description Table == + + **Root System Description Table** + Since this table can only provide 32-bit addresses, it is deprecated on arm64, and will not be used. If provided, it will be ignored. SBST Section 5.2.14 (signature == "SBST") - == Smart Battery Subsystem Table == + + **Smart Battery Subsystem Table** + Optional, not currently supported. SLIC Signature Reserved (signature == "SLIC") - == Software LIcensing table == + + **Software LIcensing table** + Microsoft only table, will not be supported. SLIT Section 5.2.17 (signature == "SLIT") - == System Locality distance Information Table == + + **System Locality distance Information Table** + Optional in general, but required for NUMA systems. SPCR Signature Reserved (signature == "SPCR") - == Serial Port Console Redirection table == + + **Serial Port Console Redirection table** + Required for arm64. SPMI Signature Reserved (signature == "SPMI") - == Server Platform Management Interface table == + + **Server Platform Management Interface table** + Optional, not currently supported. SRAT Section 5.2.16 (signature == "SRAT") - == System Resource Affinity Table == + + **System Resource Affinity Table** + Optional, but if used, only the GICC Affinity structures are read. To support arm64 NUMA, this table is required. SSDT Section 5.2.11.2 (signature == "SSDT") - == Secondary System Description Table == + + **Secondary System Description Table** + These tables are a continuation of the DSDT; these are recommended for use with devices that can be added to a running system, but can also serve the purpose of dividing up device descriptions into more @@ -272,49 +365,69 @@ SSDT Section 5.2.11.2 (signature == "SSDT") one DSDT but can contain many SSDTs. STAO Signature Reserved (signature == "STAO") - == _STA Override table == + + **_STA Override table** + Optional, but only necessary in virtualized environments in order to hide devices from guest OSs. TCPA Signature Reserved (signature == "TCPA") - == Trusted Computing Platform Alliance table == + + **Trusted Computing Platform Alliance table** + Optional, not currently supported, and may need changes to fully interoperate with arm64. TPM2 Signature Reserved (signature == "TPM2") - == Trusted Platform Module 2 table == + + **Trusted Platform Module 2 table** + Optional, not currently supported, and may need changes to fully interoperate with arm64. UEFI Signature Reserved (signature == "UEFI") - == UEFI ACPI data table == + + **UEFI ACPI data table** + Optional, not currently supported. No known use case for arm64, at present. WAET Signature Reserved (signature == "WAET") - == Windows ACPI Emulated devices Table == + + **Windows ACPI Emulated devices Table** + Microsoft only table, will not be supported. WDAT Signature Reserved (signature == "WDAT") - == Watch Dog Action Table == + + **Watch Dog Action Table** + Microsoft only table, will not be supported. WDRT Signature Reserved (signature == "WDRT") - == Watch Dog Resource Table == + + **Watch Dog Resource Table** + Microsoft only table, will not be supported. WPBT Signature Reserved (signature == "WPBT") - == Windows Platform Binary Table == + + **Windows Platform Binary Table** + Microsoft only table, will not be supported. XENV Signature Reserved (signature == "XENV") - == Xen project table == + + **Xen project table** + Optional, used only by Xen at present. XSDT Section 5.2.8 (signature == "XSDT") - == eXtended System Description Table == - Required for arm64. + **eXtended System Description Table** + + Required for arm64. +====== ======================================================================== ACPI Objects ------------ @@ -323,10 +436,11 @@ shown in the list that follows; any object not explicitly mentioned below should be used as needed for a particular platform or particular subsystem, such as power management or PCI. +===== ================ ======================================================== Name Section Usage for ARMv8 Linux ----- ------------ ------------------------------------------------- +===== ================ ======================================================== _CCA 6.2.17 This method must be defined for all bus masters - on arm64 -- there are no assumptions made about + on arm64 - there are no assumptions made about whether such devices are cache coherent or not. The _CCA value is inherited by all descendants of these devices so it does not need to be repeated. @@ -422,8 +536,8 @@ _OSC 6.2.11 This method can be a global method in ACPI (i.e., by the kernel community, then register it with the UEFI Forum. -\_OSI 5.7.2 Deprecated on ARM64. As far as ACPI firmware is - concerned, _OSI is not to be used to determine what +\_OSI 5.7.2 Deprecated on ARM64. As far as ACPI firmware is + concerned, _OSI is not to be used to determine what sort of system is being used or what functionality is provided. The _OSC method is to be used instead. @@ -447,7 +561,7 @@ _PSx 7.3.2-5 Use as needed; power management specific. If _PS0 is usage, change them in these methods. _RDI 8.4.4.4 Recommended for use with processor definitions (_HID - ACPI0010) on arm64. This should only be used in + ACPI0010) on arm64. This should only be used in conjunction with _LPI. \_REV 5.7.4 Always returns the latest version of ACPI supported. @@ -476,6 +590,7 @@ _SWS 7.4.3 Use as needed; power management specific; this may _UID 6.1.12 Recommended for distinguishing devices of the same class; define it if at all possible. +===== ================ ======================================================== @@ -488,7 +603,7 @@ platforms, ACPI events must be signaled differently. There are two options: GPIO-signaled interrupts (Section 5.6.5), and interrupt-signaled events (Section 5.6.9). Interrupt-signaled events are a -new feature in the ACPI 6.1 specification. Either -- or both -- can be used +new feature in the ACPI 6.1 specification. Either - or both - can be used on a given platform, and which to use may be dependent of limitations in any given SoC. If possible, interrupt-signaled events are recommended. @@ -564,39 +679,40 @@ supported. The following classes of objects are not supported: - -- Section 9.2: ambient light sensor devices + - Section 9.2: ambient light sensor devices - -- Section 9.3: battery devices + - Section 9.3: battery devices - -- Section 9.4: lids (e.g., laptop lids) + - Section 9.4: lids (e.g., laptop lids) - -- Section 9.8.2: IDE controllers + - Section 9.8.2: IDE controllers - -- Section 9.9: floppy controllers + - Section 9.9: floppy controllers - -- Section 9.10: GPE block devices + - Section 9.10: GPE block devices - -- Section 9.15: PC/AT RTC/CMOS devices + - Section 9.15: PC/AT RTC/CMOS devices - -- Section 9.16: user presence detection devices + - Section 9.16: user presence detection devices - -- Section 9.17: I/O APIC devices; all GICs must be enumerable via MADT + - Section 9.17: I/O APIC devices; all GICs must be enumerable via MADT - -- Section 9.18: time and alarm devices (see 9.15) + - Section 9.18: time and alarm devices (see 9.15) - -- Section 10: power source and power meter devices + - Section 10: power source and power meter devices - -- Section 11: thermal management + - Section 11: thermal management - -- Section 12: embedded controllers interface + - Section 12: embedded controllers interface - -- Section 13: SMBus interfaces + - Section 13: SMBus interfaces This also means that there is no support for the following objects: +==== =========================== ==== ========== Name Section Name Section ----- ------------ ---- ------------ +==== =========================== ==== ========== _ALC 9.3.4 _FDM 9.10.3 _ALI 9.3.2 _FIX 6.2.7 _ALP 9.3.6 _GAI 10.4.5 @@ -619,4 +735,4 @@ _DCK 6.5.2 _UPD 9.16.1 _EC 12.12 _UPP 9.16.2 _FDE 9.10.1 _WPC 10.5.2 _FDI 9.10.2 _WPP 10.5.3 - +==== =========================== ==== ========== diff --git a/Documentation/arm64/arm-acpi.txt b/Documentation/arm64/arm-acpi.rst similarity index 86% rename from Documentation/arm64/arm-acpi.txt rename to Documentation/arm64/arm-acpi.rst index 1a74a041a443..872dbbc73d4a 100644 --- a/Documentation/arm64/arm-acpi.txt +++ b/Documentation/arm64/arm-acpi.rst @@ -1,5 +1,7 @@ +===================== ACPI on ARMv8 Servers ---------------------- +===================== + ACPI can be used for ARMv8 general purpose servers designed to follow the ARM SBSA (Server Base System Architecture) [0] and SBBR (Server Base Boot Requirements) [1] specifications. Please note that the SBBR @@ -34,28 +36,28 @@ of the summary text almost directly, to be honest. The short form of the rationale for ACPI on ARM is: --- ACPI’s byte code (AML) allows the platform to encode hardware behavior, +- ACPI’s byte code (AML) allows the platform to encode hardware behavior, while DT explicitly does not support this. For hardware vendors, being able to encode behavior is a key tool used in supporting operating system releases on new hardware. --- ACPI’s OSPM defines a power management model that constrains what the +- ACPI’s OSPM defines a power management model that constrains what the platform is allowed to do into a specific model, while still providing flexibility in hardware design. --- In the enterprise server environment, ACPI has established bindings (such +- In the enterprise server environment, ACPI has established bindings (such as for RAS) which are currently used in production systems. DT does not. Such bindings could be defined in DT at some point, but doing so means ARM and x86 would end up using completely different code paths in both firmware and the kernel. --- Choosing a single interface to describe the abstraction between a platform +- Choosing a single interface to describe the abstraction between a platform and an OS is important. Hardware vendors would not be required to implement both DT and ACPI if they want to support multiple operating systems. And, agreeing on a single interface instead of being fragmented into per OS interfaces makes for better interoperability overall. --- The new ACPI governance process works well and Linux is now at the same +- The new ACPI governance process works well and Linux is now at the same table as hardware vendors and other OS vendors. In fact, there is no longer any reason to feel that ACPI only belongs to Windows or that Linux is in any way secondary to Microsoft in this arena. The move of @@ -169,31 +171,31 @@ For the ACPI core to operate properly, and in turn provide the information the kernel needs to configure devices, it expects to find the following tables (all section numbers refer to the ACPI 6.1 specification): - -- RSDP (Root System Description Pointer), section 5.2.5 + - RSDP (Root System Description Pointer), section 5.2.5 - -- XSDT (eXtended System Description Table), section 5.2.8 + - XSDT (eXtended System Description Table), section 5.2.8 - -- FADT (Fixed ACPI Description Table), section 5.2.9 + - FADT (Fixed ACPI Description Table), section 5.2.9 - -- DSDT (Differentiated System Description Table), section + - DSDT (Differentiated System Description Table), section 5.2.11.1 - -- MADT (Multiple APIC Description Table), section 5.2.12 + - MADT (Multiple APIC Description Table), section 5.2.12 - -- GTDT (Generic Timer Description Table), section 5.2.24 + - GTDT (Generic Timer Description Table), section 5.2.24 - -- If PCI is supported, the MCFG (Memory mapped ConFiGuration + - If PCI is supported, the MCFG (Memory mapped ConFiGuration Table), section 5.2.6, specifically Table 5-31. - -- If booting without a console= kernel parameter is + - If booting without a console= kernel parameter is supported, the SPCR (Serial Port Console Redirection table), section 5.2.6, specifically Table 5-31. - -- If necessary to describe the I/O topology, SMMUs and GIC ITSs, + - If necessary to describe the I/O topology, SMMUs and GIC ITSs, the IORT (Input Output Remapping Table, section 5.2.6, specifically Table 5-31). - -- If NUMA is supported, the SRAT (System Resource Affinity Table) + - If NUMA is supported, the SRAT (System Resource Affinity Table) and SLIT (System Locality distance Information Table), sections 5.2.16 and 5.2.17, respectively. @@ -269,9 +271,9 @@ describes how to define the structure of an object returned via _DSD, and how specific data structures are defined by specific UUIDs. Linux should only use the _DSD Device Properties UUID [5]: - -- UUID: daffd814-6eba-4d8c-8a91-bc9bbf4aa301 + - UUID: daffd814-6eba-4d8c-8a91-bc9bbf4aa301 - -- http://www.uefi.org/sites/default/files/resources/_DSD-device-properties-UUID.pdf + - http://www.uefi.org/sites/default/files/resources/_DSD-device-properties-UUID.pdf The UEFI Forum provides a mechanism for registering device properties [4] so that they may be used across all operating systems supporting ACPI. @@ -327,10 +329,10 @@ turning a device full off. There are two options for using those Power Resources. They can: - -- be managed in a _PSx method which gets called on entry to power + - be managed in a _PSx method which gets called on entry to power state Dx. - -- be declared separately as power resources with their own _ON and _OFF + - be declared separately as power resources with their own _ON and _OFF methods. They are then tied back to D-states for a particular device via _PRx which specifies which power resources a device needs to be on while in Dx. Kernel then tracks number of devices using a power resource @@ -339,16 +341,16 @@ There are two options for using those Power Resources. They can: The kernel ACPI code will also assume that the _PSx methods follow the normal ACPI rules for such methods: - -- If either _PS0 or _PS3 is implemented, then the other method must also + - If either _PS0 or _PS3 is implemented, then the other method must also be implemented. - -- If a device requires usage or setup of a power resource when on, the ASL + - If a device requires usage or setup of a power resource when on, the ASL should organize that it is allocated/enabled using the _PS0 method. - -- Resources allocated or enabled in the _PS0 method should be disabled + - Resources allocated or enabled in the _PS0 method should be disabled or de-allocated in the _PS3 method. - -- Firmware will leave the resources in a reasonable state before handing + - Firmware will leave the resources in a reasonable state before handing over control to the kernel. Such code in _PSx methods will of course be very platform specific. But, @@ -394,52 +396,52 @@ else must be discovered by the driver probe function. Then, have the rest of the driver operate off of the contents of that struct. Doing so should allow most divergence between ACPI and DT functionality to be kept local to the probe function instead of being scattered throughout the driver. For -example: +example:: -static int device_probe_dt(struct platform_device *pdev) -{ - /* DT specific functionality */ - ... -} + static int device_probe_dt(struct platform_device *pdev) + { + /* DT specific functionality */ + ... + } -static int device_probe_acpi(struct platform_device *pdev) -{ - /* ACPI specific functionality */ - ... -} + static int device_probe_acpi(struct platform_device *pdev) + { + /* ACPI specific functionality */ + ... + } -static int device_probe(struct platform_device *pdev) -{ - ... - struct device_node node = pdev->dev.of_node; - ... + static int device_probe(struct platform_device *pdev) + { + ... + struct device_node node = pdev->dev.of_node; + ... - if (node) - ret = device_probe_dt(pdev); - else if (ACPI_HANDLE(&pdev->dev)) - ret = device_probe_acpi(pdev); - else - /* other initialization */ - ... - /* Continue with any generic probe operations */ - ... -} + if (node) + ret = device_probe_dt(pdev); + else if (ACPI_HANDLE(&pdev->dev)) + ret = device_probe_acpi(pdev); + else + /* other initialization */ + ... + /* Continue with any generic probe operations */ + ... + } DO keep the MODULE_DEVICE_TABLE entries together in the driver to make it clear the different names the driver is probed for, both from DT and from -ACPI: +ACPI:: -static struct of_device_id virtio_mmio_match[] = { - { .compatible = "virtio,mmio", }, - { } -}; -MODULE_DEVICE_TABLE(of, virtio_mmio_match); + static struct of_device_id virtio_mmio_match[] = { + { .compatible = "virtio,mmio", }, + { } + }; + MODULE_DEVICE_TABLE(of, virtio_mmio_match); -static const struct acpi_device_id virtio_mmio_acpi_match[] = { - { "LNRO0005", }, - { } -}; -MODULE_DEVICE_TABLE(acpi, virtio_mmio_acpi_match); + static const struct acpi_device_id virtio_mmio_acpi_match[] = { + { "LNRO0005", }, + { } + }; + MODULE_DEVICE_TABLE(acpi, virtio_mmio_acpi_match); ASWG @@ -471,7 +473,8 @@ Linux Code Individual items specific to Linux on ARM, contained in the the Linux source code, are in the list that follows: -ACPI_OS_NAME This macro defines the string to be returned when +ACPI_OS_NAME + This macro defines the string to be returned when an ACPI method invokes the _OS method. On ARM64 systems, this macro will be "Linux" by default. The command line parameter acpi_os= @@ -482,38 +485,44 @@ ACPI_OS_NAME This macro defines the string to be returned when ACPI Objects ------------ Detailed expectations for ACPI tables and object are listed in the file -Documentation/arm64/acpi_object_usage.txt. +Documentation/arm64/acpi_object_usage.rst. References ---------- -[0] http://silver.arm.com -- document ARM-DEN-0029, or newer +[0] http://silver.arm.com + document ARM-DEN-0029, or newer: "Server Base System Architecture", version 2.3, dated 27 Mar 2014 [1] http://infocenter.arm.com/help/topic/com.arm.doc.den0044a/Server_Base_Boot_Requirements.pdf Document ARM-DEN-0044A, or newer: "Server Base Boot Requirements, System Software on ARM Platforms", dated 16 Aug 2014 -[2] http://www.secretlab.ca/archives/151, 10 Jan 2015, Copyright (c) 2015, +[2] http://www.secretlab.ca/archives/151, + 10 Jan 2015, Copyright (c) 2015, Linaro Ltd., written by Grant Likely. -[3] AMD ACPI for Seattle platform documentation: +[3] AMD ACPI for Seattle platform documentation http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/Seattle_ACPI_Guide.pdf -[4] http://www.uefi.org/acpi -- please see the link for the "ACPI _DSD Device + +[4] http://www.uefi.org/acpi + please see the link for the "ACPI _DSD Device Property Registry Instructions" -[5] http://www.uefi.org/acpi -- please see the link for the "_DSD (Device +[5] http://www.uefi.org/acpi + please see the link for the "_DSD (Device Specific Data) Implementation Guide" -[6] Kernel code for the unified device property interface can be found in +[6] Kernel code for the unified device + property interface can be found in include/linux/property.h and drivers/base/property.c. Authors ------- -Al Stone -Graeme Gregory -Hanjun Guo +- Al Stone +- Graeme Gregory +- Hanjun Guo -Grant Likely , for the "Why ACPI on ARM?" section +- Grant Likely , for the "Why ACPI on ARM?" section diff --git a/Documentation/arm64/booting.txt b/Documentation/arm64/booting.rst similarity index 86% rename from Documentation/arm64/booting.txt rename to Documentation/arm64/booting.rst index fbab7e21d116..3d041d0d16e8 100644 --- a/Documentation/arm64/booting.txt +++ b/Documentation/arm64/booting.rst @@ -1,7 +1,9 @@ - Booting AArch64 Linux - ===================== +===================== +Booting AArch64 Linux +===================== Author: Will Deacon + Date : 07 September 2012 This document is based on the ARM booting document by Russell King and @@ -12,7 +14,7 @@ The AArch64 exception model is made up of a number of exception levels counterpart. EL2 is the hypervisor level and exists only in non-secure mode. EL3 is the highest priority level and exists only in secure mode. -For the purposes of this document, we will use the term `boot loader' +For the purposes of this document, we will use the term `boot loader` simply to define all software that executes on the CPU(s) before control is passed to the Linux kernel. This may include secure monitor and hypervisor code, or it may just be a handful of instructions for @@ -70,7 +72,7 @@ Image target is available instead. Requirement: MANDATORY -The decompressed kernel image contains a 64-byte header as follows: +The decompressed kernel image contains a 64-byte header as follows:: u32 code0; /* Executable code */ u32 code1; /* Executable code */ @@ -103,19 +105,26 @@ Header notes: - The flags field (introduced in v3.17) is a little-endian 64-bit field composed as follows: - Bit 0: Kernel endianness. 1 if BE, 0 if LE. - Bit 1-2: Kernel Page size. - 0 - Unspecified. - 1 - 4K - 2 - 16K - 3 - 64K - Bit 3: Kernel physical placement - 0 - 2MB aligned base should be as close as possible - to the base of DRAM, since memory below it is not - accessible via the linear mapping - 1 - 2MB aligned base may be anywhere in physical - memory - Bits 4-63: Reserved. + + ============= =============================================================== + Bit 0 Kernel endianness. 1 if BE, 0 if LE. + Bit 1-2 Kernel Page size. + + * 0 - Unspecified. + * 1 - 4K + * 2 - 16K + * 3 - 64K + Bit 3 Kernel physical placement + + 0 + 2MB aligned base should be as close as possible + to the base of DRAM, since memory below it is not + accessible via the linear mapping + 1 + 2MB aligned base may be anywhere in physical + memory + Bits 4-63 Reserved. + ============= =============================================================== - When image_size is zero, a bootloader should attempt to keep as much memory as possible free for use by the kernel immediately after the @@ -147,19 +156,22 @@ Before jumping into the kernel, the following conditions must be met: corrupted by bogus network packets or disk data. This will save you many hours of debug. -- Primary CPU general-purpose register settings - x0 = physical address of device tree blob (dtb) in system RAM. - x1 = 0 (reserved for future use) - x2 = 0 (reserved for future use) - x3 = 0 (reserved for future use) +- Primary CPU general-purpose register settings: + + - x0 = physical address of device tree blob (dtb) in system RAM. + - x1 = 0 (reserved for future use) + - x2 = 0 (reserved for future use) + - x3 = 0 (reserved for future use) - CPU mode + All forms of interrupts must be masked in PSTATE.DAIF (Debug, SError, IRQ and FIQ). The CPU must be in either EL2 (RECOMMENDED in order to have access to the virtualisation extensions) or non-secure EL1. - Caches, MMUs + The MMU must be off. Instruction cache may be on or off. The address range corresponding to the loaded kernel image must be @@ -172,18 +184,21 @@ Before jumping into the kernel, the following conditions must be met: operations (not recommended) must be configured and disabled. - Architected timers + CNTFRQ must be programmed with the timer frequency and CNTVOFF must be programmed with a consistent value on all CPUs. If entering the kernel at EL1, CNTHCTL_EL2 must have EL1PCTEN (bit 0) set where available. - Coherency + All CPUs to be booted by the kernel must be part of the same coherency domain on entry to the kernel. This may require IMPLEMENTATION DEFINED initialisation to enable the receiving of maintenance operations on each CPU. - System registers + All writable architected system registers at the exception level where the kernel image will be entered must be initialised by software at a higher exception level to prevent execution in an UNKNOWN state. @@ -195,28 +210,40 @@ Before jumping into the kernel, the following conditions must be met: For systems with a GICv3 interrupt controller to be used in v3 mode: - If EL3 is present: - ICC_SRE_EL3.Enable (bit 3) must be initialiased to 0b1. - ICC_SRE_EL3.SRE (bit 0) must be initialised to 0b1. + + - ICC_SRE_EL3.Enable (bit 3) must be initialiased to 0b1. + - ICC_SRE_EL3.SRE (bit 0) must be initialised to 0b1. + - If the kernel is entered at EL1: - ICC.SRE_EL2.Enable (bit 3) must be initialised to 0b1 - ICC_SRE_EL2.SRE (bit 0) must be initialised to 0b1. + + - ICC.SRE_EL2.Enable (bit 3) must be initialised to 0b1 + - ICC_SRE_EL2.SRE (bit 0) must be initialised to 0b1. + - The DT or ACPI tables must describe a GICv3 interrupt controller. For systems with a GICv3 interrupt controller to be used in compatibility (v2) mode: + - If EL3 is present: - ICC_SRE_EL3.SRE (bit 0) must be initialised to 0b0. + + ICC_SRE_EL3.SRE (bit 0) must be initialised to 0b0. + - If the kernel is entered at EL1: - ICC_SRE_EL2.SRE (bit 0) must be initialised to 0b0. + + ICC_SRE_EL2.SRE (bit 0) must be initialised to 0b0. + - The DT or ACPI tables must describe a GICv2 interrupt controller. For CPUs with pointer authentication functionality: - If EL3 is present: - SCR_EL3.APK (bit 16) must be initialised to 0b1 - SCR_EL3.API (bit 17) must be initialised to 0b1 + + - SCR_EL3.APK (bit 16) must be initialised to 0b1 + - SCR_EL3.API (bit 17) must be initialised to 0b1 + - If the kernel is entered at EL1: - HCR_EL2.APK (bit 40) must be initialised to 0b1 - HCR_EL2.API (bit 41) must be initialised to 0b1 + + - HCR_EL2.APK (bit 40) must be initialised to 0b1 + - HCR_EL2.API (bit 41) must be initialised to 0b1 The requirements described above for CPU mode, caches, MMUs, architected timers, coherency and system registers apply to all CPUs. All CPUs must diff --git a/Documentation/arm64/cpu-feature-registers.txt b/Documentation/arm64/cpu-feature-registers.rst similarity index 65% rename from Documentation/arm64/cpu-feature-registers.txt rename to Documentation/arm64/cpu-feature-registers.rst index 684a0da39378..2955287e9acc 100644 --- a/Documentation/arm64/cpu-feature-registers.txt +++ b/Documentation/arm64/cpu-feature-registers.rst @@ -1,5 +1,6 @@ - ARM64 CPU Feature Registers - =========================== +=========================== +ARM64 CPU Feature Registers +=========================== Author: Suzuki K Poulose @@ -9,7 +10,7 @@ registers to userspace. The availability of this ABI is advertised via the HWCAP_CPUID in HWCAPs. 1. Motivation ---------------- +------------- The ARM architecture defines a set of feature registers, which describe the capabilities of the CPU/system. Access to these system registers is @@ -33,9 +34,10 @@ there are some issues with their usage. 2. Requirements ------------------ +--------------- + + a) Safety: - a) Safety : Applications should be able to use the information provided by the infrastructure to run safely across the system. This has greater implications on a system with heterogeneous CPUs. @@ -47,7 +49,8 @@ there are some issues with their usage. Otherwise an application could crash when scheduled on the CPU which doesn't support CRC32. - b) Security : + b) Security: + Applications should only be able to receive information that is relevant to the normal operation in userspace. Hence, some of the fields are masked out(i.e, made invisible) and their values are set to @@ -58,10 +61,12 @@ there are some issues with their usage. (even when the CPU provides it). c) Implementation Defined Features + The infrastructure doesn't expose any register which is IMPLEMENTATION DEFINED as per ARMv8-A Architecture. - d) CPU Identification : + d) CPU Identification: + MIDR_EL1 is exposed to help identify the processor. On a heterogeneous system, this could be racy (just like getcpu()). The process could be migrated to another CPU by the time it uses the @@ -70,7 +75,7 @@ there are some issues with their usage. currently executing on. The REVIDR is not exposed due to this constraint, as REVIDR makes sense only in conjunction with the MIDR. Alternately, MIDR_EL1 and REVIDR_EL1 are exposed via sysfs - at: + at:: /sys/devices/system/cpu/cpu$ID/regs/identification/ \- midr @@ -85,7 +90,8 @@ exception and ends up in SIGILL being delivered to the process. The infrastructure hooks into the exception handler and emulates the operation if the source belongs to the supported system register space. -The infrastructure emulates only the following system register space: +The infrastructure emulates only the following system register space:: + Op0=3, Op1=0, CRn=0, CRm=0,4,5,6,7 (See Table C5-6 'System instruction encodings for non-Debug System @@ -107,73 +113,76 @@ infrastructure: ------------------------------------------- 1) ID_AA64ISAR0_EL1 - Instruction Set Attribute Register 0 - x--------------------------------------------------x + + +------------------------------+---------+---------+ | Name | bits | visible | - |--------------------------------------------------| + +------------------------------+---------+---------+ | TS | [55-52] | y | - |--------------------------------------------------| + +------------------------------+---------+---------+ | FHM | [51-48] | y | - |--------------------------------------------------| + +------------------------------+---------+---------+ | DP | [47-44] | y | - |--------------------------------------------------| + +------------------------------+---------+---------+ | SM4 | [43-40] | y | - |--------------------------------------------------| + +------------------------------+---------+---------+ | SM3 | [39-36] | y | - |--------------------------------------------------| + +------------------------------+---------+---------+ | SHA3 | [35-32] | y | - |--------------------------------------------------| + +------------------------------+---------+---------+ | RDM | [31-28] | y | - |--------------------------------------------------| + +------------------------------+---------+---------+ | ATOMICS | [23-20] | y | - |--------------------------------------------------| + +------------------------------+---------+---------+ | CRC32 | [19-16] | y | - |--------------------------------------------------| + +------------------------------+---------+---------+ | SHA2 | [15-12] | y | - |--------------------------------------------------| + +------------------------------+---------+---------+ | SHA1 | [11-8] | y | - |--------------------------------------------------| + +------------------------------+---------+---------+ | AES | [7-4] | y | - x--------------------------------------------------x + +------------------------------+---------+---------+ 2) ID_AA64PFR0_EL1 - Processor Feature Register 0 - x--------------------------------------------------x + + +------------------------------+---------+---------+ | Name | bits | visible | - |--------------------------------------------------| + +------------------------------+---------+---------+ | DIT | [51-48] | y | - |--------------------------------------------------| + +------------------------------+---------+---------+ | SVE | [35-32] | y | - |--------------------------------------------------| + +------------------------------+---------+---------+ | GIC | [27-24] | n | - |--------------------------------------------------| + +------------------------------+---------+---------+ | AdvSIMD | [23-20] | y | - |--------------------------------------------------| + +------------------------------+---------+---------+ | FP | [19-16] | y | - |--------------------------------------------------| + +------------------------------+---------+---------+ | EL3 | [15-12] | n | - |--------------------------------------------------| + +------------------------------+---------+---------+ | EL2 | [11-8] | n | - |--------------------------------------------------| + +------------------------------+---------+---------+ | EL1 | [7-4] | n | - |--------------------------------------------------| + +------------------------------+---------+---------+ | EL0 | [3-0] | n | - x--------------------------------------------------x + +------------------------------+---------+---------+ 3) MIDR_EL1 - Main ID Register - x--------------------------------------------------x + + +------------------------------+---------+---------+ | Name | bits | visible | - |--------------------------------------------------| + +------------------------------+---------+---------+ | Implementer | [31-24] | y | - |--------------------------------------------------| + +------------------------------+---------+---------+ | Variant | [23-20] | y | - |--------------------------------------------------| + +------------------------------+---------+---------+ | Architecture | [19-16] | y | - |--------------------------------------------------| + +------------------------------+---------+---------+ | PartNum | [15-4] | y | - |--------------------------------------------------| + +------------------------------+---------+---------+ | Revision | [3-0] | y | - x--------------------------------------------------x + +------------------------------+---------+---------+ NOTE: The 'visible' fields of MIDR_EL1 will contain the value as available on the CPU where it is fetched and is not a system @@ -181,90 +190,92 @@ infrastructure: 4) ID_AA64ISAR1_EL1 - Instruction set attribute register 1 - x--------------------------------------------------x + +------------------------------+---------+---------+ | Name | bits | visible | - |--------------------------------------------------| + +------------------------------+---------+---------+ | GPI | [31-28] | y | - |--------------------------------------------------| + +------------------------------+---------+---------+ | GPA | [27-24] | y | - |--------------------------------------------------| + +------------------------------+---------+---------+ | LRCPC | [23-20] | y | - |--------------------------------------------------| + +------------------------------+---------+---------+ | FCMA | [19-16] | y | - |--------------------------------------------------| + +------------------------------+---------+---------+ | JSCVT | [15-12] | y | - |--------------------------------------------------| + +------------------------------+---------+---------+ | API | [11-8] | y | - |--------------------------------------------------| + +------------------------------+---------+---------+ | APA | [7-4] | y | - |--------------------------------------------------| + +------------------------------+---------+---------+ | DPB | [3-0] | y | - x--------------------------------------------------x + +------------------------------+---------+---------+ 5) ID_AA64MMFR2_EL1 - Memory model feature register 2 - x--------------------------------------------------x + +------------------------------+---------+---------+ | Name | bits | visible | - |--------------------------------------------------| + +------------------------------+---------+---------+ | AT | [35-32] | y | - x--------------------------------------------------x + +------------------------------+---------+---------+ 6) ID_AA64ZFR0_EL1 - SVE feature ID register 0 - x--------------------------------------------------x + +------------------------------+---------+---------+ | Name | bits | visible | - |--------------------------------------------------| + +------------------------------+---------+---------+ | SM4 | [43-40] | y | - |--------------------------------------------------| + +------------------------------+---------+---------+ | SHA3 | [35-32] | y | - |--------------------------------------------------| + +------------------------------+---------+---------+ | BitPerm | [19-16] | y | - |--------------------------------------------------| + +------------------------------+---------+---------+ | AES | [7-4] | y | - |--------------------------------------------------| + +------------------------------+---------+---------+ | SVEVer | [3-0] | y | - x--------------------------------------------------x + +------------------------------+---------+---------+ Appendix I: Example ---------------------------- +------------------- -/* - * Sample program to demonstrate the MRS emulation ABI. - * - * Copyright (C) 2015-2016, ARM Ltd - * - * Author: Suzuki K Poulose - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License version 2 as - * published by the Free Software Foundation. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License version 2 as - * published by the Free Software Foundation. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - */ +:: -#include -#include -#include + /* + * Sample program to demonstrate the MRS emulation ABI. + * + * Copyright (C) 2015-2016, ARM Ltd + * + * Author: Suzuki K Poulose + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ -#define get_cpu_ftr(id) ({ \ + #include + #include + #include + + #define get_cpu_ftr(id) ({ \ unsigned long __val; \ asm("mrs %0, "#id : "=r" (__val)); \ printf("%-20s: 0x%016lx\n", #id, __val); \ }) -int main(void) -{ + int main(void) + { if (!(getauxval(AT_HWCAP) & HWCAP_CPUID)) { fputs("CPUID registers unavailable\n", stderr); @@ -284,13 +295,10 @@ int main(void) get_cpu_ftr(MPIDR_EL1); get_cpu_ftr(REVIDR_EL1); -#if 0 + #if 0 /* Unexposed register access causes SIGILL */ get_cpu_ftr(ID_MMFR0_EL1); -#endif + #endif return 0; -} - - - + } diff --git a/Documentation/arm64/elf_hwcaps.txt b/Documentation/arm64/elf_hwcaps.rst similarity index 92% rename from Documentation/arm64/elf_hwcaps.txt rename to Documentation/arm64/elf_hwcaps.rst index b73a2519ecf2..c7cbf4b571c0 100644 --- a/Documentation/arm64/elf_hwcaps.txt +++ b/Documentation/arm64/elf_hwcaps.rst @@ -1,3 +1,4 @@ +================ ARM64 ELF hwcaps ================ @@ -15,16 +16,16 @@ of flags called hwcaps, exposed in the auxilliary vector. Userspace software can test for features by acquiring the AT_HWCAP or AT_HWCAP2 entry of the auxiliary vector, and testing whether the relevant -flags are set, e.g. +flags are set, e.g.:: -bool floating_point_is_present(void) -{ - unsigned long hwcaps = getauxval(AT_HWCAP); - if (hwcaps & HWCAP_FP) - return true; + bool floating_point_is_present(void) + { + unsigned long hwcaps = getauxval(AT_HWCAP); + if (hwcaps & HWCAP_FP) + return true; - return false; -} + return false; + } Where software relies on a feature described by a hwcap, it should check the relevant hwcap flag to verify that the feature is present before @@ -45,7 +46,7 @@ userspace code at EL0. These hwcaps are defined in terms of ID register fields, and should be interpreted with reference to the definition of these fields in the ARM Architecture Reference Manual (ARM ARM). -Such hwcaps are described below in the form: +Such hwcaps are described below in the form:: Functionality implied by idreg.field == val. @@ -64,75 +65,58 @@ reference to ID registers, and may refer to other documentation. --------------------------------- HWCAP_FP - Functionality implied by ID_AA64PFR0_EL1.FP == 0b0000. HWCAP_ASIMD - Functionality implied by ID_AA64PFR0_EL1.AdvSIMD == 0b0000. HWCAP_EVTSTRM - The generic timer is configured to generate events at a frequency of approximately 100KHz. HWCAP_AES - Functionality implied by ID_AA64ISAR0_EL1.AES == 0b0001. HWCAP_PMULL - Functionality implied by ID_AA64ISAR0_EL1.AES == 0b0010. HWCAP_SHA1 - Functionality implied by ID_AA64ISAR0_EL1.SHA1 == 0b0001. HWCAP_SHA2 - Functionality implied by ID_AA64ISAR0_EL1.SHA2 == 0b0001. HWCAP_CRC32 - Functionality implied by ID_AA64ISAR0_EL1.CRC32 == 0b0001. HWCAP_ATOMICS - Functionality implied by ID_AA64ISAR0_EL1.Atomic == 0b0010. HWCAP_FPHP - Functionality implied by ID_AA64PFR0_EL1.FP == 0b0001. HWCAP_ASIMDHP - Functionality implied by ID_AA64PFR0_EL1.AdvSIMD == 0b0001. HWCAP_CPUID - EL0 access to certain ID registers is available, to the extent - described by Documentation/arm64/cpu-feature-registers.txt. + described by Documentation/arm64/cpu-feature-registers.rst. These ID registers may imply the availability of features. HWCAP_ASIMDRDM - Functionality implied by ID_AA64ISAR0_EL1.RDM == 0b0001. HWCAP_JSCVT - Functionality implied by ID_AA64ISAR1_EL1.JSCVT == 0b0001. HWCAP_FCMA - Functionality implied by ID_AA64ISAR1_EL1.FCMA == 0b0001. HWCAP_LRCPC - Functionality implied by ID_AA64ISAR1_EL1.LRCPC == 0b0001. HWCAP_DCPOP - Functionality implied by ID_AA64ISAR1_EL1.DPB == 0b0001. HWCAP2_DCPODP @@ -140,27 +124,21 @@ HWCAP2_DCPODP Functionality implied by ID_AA64ISAR1_EL1.DPB == 0b0010. HWCAP_SHA3 - Functionality implied by ID_AA64ISAR0_EL1.SHA3 == 0b0001. HWCAP_SM3 - Functionality implied by ID_AA64ISAR0_EL1.SM3 == 0b0001. HWCAP_SM4 - Functionality implied by ID_AA64ISAR0_EL1.SM4 == 0b0001. HWCAP_ASIMDDP - Functionality implied by ID_AA64ISAR0_EL1.DP == 0b0001. HWCAP_SHA512 - Functionality implied by ID_AA64ISAR0_EL1.SHA2 == 0b0010. HWCAP_SVE - Functionality implied by ID_AA64PFR0_EL1.SVE == 0b0001. HWCAP2_SVE2 @@ -188,40 +166,32 @@ HWCAP2_SVESM4 Functionality implied by ID_AA64ZFR0_EL1.SM4 == 0b0001. HWCAP_ASIMDFHM - Functionality implied by ID_AA64ISAR0_EL1.FHM == 0b0001. HWCAP_DIT - Functionality implied by ID_AA64PFR0_EL1.DIT == 0b0001. HWCAP_USCAT - Functionality implied by ID_AA64MMFR2_EL1.AT == 0b0001. HWCAP_ILRCPC - Functionality implied by ID_AA64ISAR1_EL1.LRCPC == 0b0010. HWCAP_FLAGM - Functionality implied by ID_AA64ISAR0_EL1.TS == 0b0001. HWCAP_SSBS - Functionality implied by ID_AA64PFR1_EL1.SSBS == 0b0010. HWCAP_PACA - Functionality implied by ID_AA64ISAR1_EL1.APA == 0b0001 or ID_AA64ISAR1_EL1.API == 0b0001, as described by - Documentation/arm64/pointer-authentication.txt. + Documentation/arm64/pointer-authentication.rst. HWCAP_PACG - Functionality implied by ID_AA64ISAR1_EL1.GPA == 0b0001 or ID_AA64ISAR1_EL1.GPI == 0b0001, as described by - Documentation/arm64/pointer-authentication.txt. + Documentation/arm64/pointer-authentication.rst. 4. Unused AT_HWCAP bits diff --git a/Documentation/arm64/hugetlbpage.txt b/Documentation/arm64/hugetlbpage.rst similarity index 86% rename from Documentation/arm64/hugetlbpage.txt rename to Documentation/arm64/hugetlbpage.rst index cfae87dc653b..b44f939e5210 100644 --- a/Documentation/arm64/hugetlbpage.txt +++ b/Documentation/arm64/hugetlbpage.rst @@ -1,3 +1,4 @@ +==================== HugeTLBpage on ARM64 ==================== @@ -31,8 +32,10 @@ and level of the page table. The following hugepage sizes are supported - - CONT PTE PMD CONT PMD PUD - -------- --- -------- --- + ====== ======== ==== ======== === + - CONT PTE PMD CONT PMD PUD + ====== ======== ==== ======== === 4K: 64K 2M 32M 1G 16K: 2M 32M 1G 64K: 2M 512M 16G + ====== ======== ==== ======== === diff --git a/Documentation/arm64/index.rst b/Documentation/arm64/index.rst new file mode 100644 index 000000000000..018b7836ecb7 --- /dev/null +++ b/Documentation/arm64/index.rst @@ -0,0 +1,28 @@ +:orphan: + +================== +ARM64 Architecture +================== + +.. toctree:: + :maxdepth: 1 + + acpi_object_usage + arm-acpi + booting + cpu-feature-registers + elf_hwcaps + hugetlbpage + legacy_instructions + memory + pointer-authentication + silicon-errata + sve + tagged-pointers + +.. only:: subproject and html + + Indices + ======= + + * :ref:`genindex` diff --git a/Documentation/arm64/legacy_instructions.txt b/Documentation/arm64/legacy_instructions.rst similarity index 73% rename from Documentation/arm64/legacy_instructions.txt rename to Documentation/arm64/legacy_instructions.rst index 01bf3d9fac85..54401b22cb8f 100644 --- a/Documentation/arm64/legacy_instructions.txt +++ b/Documentation/arm64/legacy_instructions.rst @@ -1,3 +1,7 @@ +=================== +Legacy instructions +=================== + The arm64 port of the Linux kernel provides infrastructure to support emulation of instructions which have been deprecated, or obsoleted in the architecture. The infrastructure code uses undefined instruction @@ -9,19 +13,22 @@ The emulation mode can be controlled by writing to sysctl nodes behaviours and the corresponding values of the sysctl nodes - * Undef - Value: 0 + Value: 0 + Generates undefined instruction abort. Default for instructions that have been obsoleted in the architecture, e.g., SWP * Emulate - Value: 1 + Value: 1 + Uses software emulation. To aid migration of software, in this mode usage of emulated instruction is traced as well as rate limited warnings are issued. This is the default for deprecated instructions, .e.g., CP15 barriers * Hardware Execution - Value: 2 + Value: 2 + Although marked as deprecated, some implementations may support the enabling/disabling of hardware support for the execution of these instructions. Using hardware execution generally provides better @@ -38,20 +45,24 @@ individual instruction notes for further information. Supported legacy instructions ----------------------------- * SWP{B} -Node: /proc/sys/abi/swp -Status: Obsolete -Default: Undef (0) + +:Node: /proc/sys/abi/swp +:Status: Obsolete +:Default: Undef (0) * CP15 Barriers -Node: /proc/sys/abi/cp15_barrier -Status: Deprecated -Default: Emulate (1) + +:Node: /proc/sys/abi/cp15_barrier +:Status: Deprecated +:Default: Emulate (1) * SETEND -Node: /proc/sys/abi/setend -Status: Deprecated -Default: Emulate (1)* -Note: All the cpus on the system must have mixed endian support at EL0 -for this feature to be enabled. If a new CPU - which doesn't support mixed -endian - is hotplugged in after this feature has been enabled, there could -be unexpected results in the application. + +:Node: /proc/sys/abi/setend +:Status: Deprecated +:Default: Emulate (1)* + + Note: All the cpus on the system must have mixed endian support at EL0 + for this feature to be enabled. If a new CPU - which doesn't support mixed + endian - is hotplugged in after this feature has been enabled, there could + be unexpected results in the application. diff --git a/Documentation/arm64/memory.rst b/Documentation/arm64/memory.rst new file mode 100644 index 000000000000..464b880fc4b7 --- /dev/null +++ b/Documentation/arm64/memory.rst @@ -0,0 +1,98 @@ +============================== +Memory Layout on AArch64 Linux +============================== + +Author: Catalin Marinas + +This document describes the virtual memory layout used by the AArch64 +Linux kernel. The architecture allows up to 4 levels of translation +tables with a 4KB page size and up to 3 levels with a 64KB page size. + +AArch64 Linux uses either 3 levels or 4 levels of translation tables +with the 4KB page configuration, allowing 39-bit (512GB) or 48-bit +(256TB) virtual addresses, respectively, for both user and kernel. With +64KB pages, only 2 levels of translation tables, allowing 42-bit (4TB) +virtual address, are used but the memory layout is the same. + +User addresses have bits 63:48 set to 0 while the kernel addresses have +the same bits set to 1. TTBRx selection is given by bit 63 of the +virtual address. The swapper_pg_dir contains only kernel (global) +mappings while the user pgd contains only user (non-global) mappings. +The swapper_pg_dir address is written to TTBR1 and never written to +TTBR0. + + +AArch64 Linux memory layout with 4KB pages + 3 levels:: + + Start End Size Use + ----------------------------------------------------------------------- + 0000000000000000 0000007fffffffff 512GB user + ffffff8000000000 ffffffffffffffff 512GB kernel + + +AArch64 Linux memory layout with 4KB pages + 4 levels:: + + Start End Size Use + ----------------------------------------------------------------------- + 0000000000000000 0000ffffffffffff 256TB user + ffff000000000000 ffffffffffffffff 256TB kernel + + +AArch64 Linux memory layout with 64KB pages + 2 levels:: + + Start End Size Use + ----------------------------------------------------------------------- + 0000000000000000 000003ffffffffff 4TB user + fffffc0000000000 ffffffffffffffff 4TB kernel + + +AArch64 Linux memory layout with 64KB pages + 3 levels:: + + Start End Size Use + ----------------------------------------------------------------------- + 0000000000000000 0000ffffffffffff 256TB user + ffff000000000000 ffffffffffffffff 256TB kernel + + +For details of the virtual kernel memory layout please see the kernel +booting log. + + +Translation table lookup with 4KB pages:: + + +--------+--------+--------+--------+--------+--------+--------+--------+ + |63 56|55 48|47 40|39 32|31 24|23 16|15 8|7 0| + +--------+--------+--------+--------+--------+--------+--------+--------+ + | | | | | | + | | | | | v + | | | | | [11:0] in-page offset + | | | | +-> [20:12] L3 index + | | | +-----------> [29:21] L2 index + | | +---------------------> [38:30] L1 index + | +-------------------------------> [47:39] L0 index + +-------------------------------------------------> [63] TTBR0/1 + + +Translation table lookup with 64KB pages:: + + +--------+--------+--------+--------+--------+--------+--------+--------+ + |63 56|55 48|47 40|39 32|31 24|23 16|15 8|7 0| + +--------+--------+--------+--------+--------+--------+--------+--------+ + | | | | | + | | | | v + | | | | [15:0] in-page offset + | | | +----------> [28:16] L3 index + | | +--------------------------> [41:29] L2 index + | +-------------------------------> [47:42] L1 index + +-------------------------------------------------> [63] TTBR0/1 + + +When using KVM without the Virtualization Host Extensions, the +hypervisor maps kernel pages in EL2 at a fixed (and potentially +random) offset from the linear mapping. See the kern_hyp_va macro and +kvm_update_va_mask function for more details. MMIO devices such as +GICv2 gets mapped next to the HYP idmap page, as do vectors when +ARM64_HARDEN_EL2_VECTORS is selected for particular CPUs. + +When using KVM with the Virtualization Host Extensions, no additional +mappings are created, since the host kernel runs directly in EL2. diff --git a/Documentation/arm64/memory.txt b/Documentation/arm64/memory.txt deleted file mode 100644 index c5dab30d3389..000000000000 --- a/Documentation/arm64/memory.txt +++ /dev/null @@ -1,97 +0,0 @@ - Memory Layout on AArch64 Linux - ============================== - -Author: Catalin Marinas - -This document describes the virtual memory layout used by the AArch64 -Linux kernel. The architecture allows up to 4 levels of translation -tables with a 4KB page size and up to 3 levels with a 64KB page size. - -AArch64 Linux uses either 3 levels or 4 levels of translation tables -with the 4KB page configuration, allowing 39-bit (512GB) or 48-bit -(256TB) virtual addresses, respectively, for both user and kernel. With -64KB pages, only 2 levels of translation tables, allowing 42-bit (4TB) -virtual address, are used but the memory layout is the same. - -User addresses have bits 63:48 set to 0 while the kernel addresses have -the same bits set to 1. TTBRx selection is given by bit 63 of the -virtual address. The swapper_pg_dir contains only kernel (global) -mappings while the user pgd contains only user (non-global) mappings. -The swapper_pg_dir address is written to TTBR1 and never written to -TTBR0. - - -AArch64 Linux memory layout with 4KB pages + 3 levels: - -Start End Size Use ------------------------------------------------------------------------ -0000000000000000 0000007fffffffff 512GB user -ffffff8000000000 ffffffffffffffff 512GB kernel - - -AArch64 Linux memory layout with 4KB pages + 4 levels: - -Start End Size Use ------------------------------------------------------------------------ -0000000000000000 0000ffffffffffff 256TB user -ffff000000000000 ffffffffffffffff 256TB kernel - - -AArch64 Linux memory layout with 64KB pages + 2 levels: - -Start End Size Use ------------------------------------------------------------------------ -0000000000000000 000003ffffffffff 4TB user -fffffc0000000000 ffffffffffffffff 4TB kernel - - -AArch64 Linux memory layout with 64KB pages + 3 levels: - -Start End Size Use ------------------------------------------------------------------------ -0000000000000000 0000ffffffffffff 256TB user -ffff000000000000 ffffffffffffffff 256TB kernel - - -For details of the virtual kernel memory layout please see the kernel -booting log. - - -Translation table lookup with 4KB pages: - -+--------+--------+--------+--------+--------+--------+--------+--------+ -|63 56|55 48|47 40|39 32|31 24|23 16|15 8|7 0| -+--------+--------+--------+--------+--------+--------+--------+--------+ - | | | | | | - | | | | | v - | | | | | [11:0] in-page offset - | | | | +-> [20:12] L3 index - | | | +-----------> [29:21] L2 index - | | +---------------------> [38:30] L1 index - | +-------------------------------> [47:39] L0 index - +-------------------------------------------------> [63] TTBR0/1 - - -Translation table lookup with 64KB pages: - -+--------+--------+--------+--------+--------+--------+--------+--------+ -|63 56|55 48|47 40|39 32|31 24|23 16|15 8|7 0| -+--------+--------+--------+--------+--------+--------+--------+--------+ - | | | | | - | | | | v - | | | | [15:0] in-page offset - | | | +----------> [28:16] L3 index - | | +--------------------------> [41:29] L2 index - | +-------------------------------> [47:42] L1 index - +-------------------------------------------------> [63] TTBR0/1 - - -When using KVM without the Virtualization Host Extensions, the -hypervisor maps kernel pages in EL2 at a fixed (and potentially -random) offset from the linear mapping. See the kern_hyp_va macro and -kvm_update_va_mask function for more details. MMIO devices such as -GICv2 gets mapped next to the HYP idmap page, as do vectors when -ARM64_HARDEN_EL2_VECTORS is selected for particular CPUs. - -When using KVM with the Virtualization Host Extensions, no additional -mappings are created, since the host kernel runs directly in EL2. diff --git a/Documentation/arm64/pointer-authentication.txt b/Documentation/arm64/pointer-authentication.rst similarity index 99% rename from Documentation/arm64/pointer-authentication.txt rename to Documentation/arm64/pointer-authentication.rst index fc71b33de87e..30b2ab06526b 100644 --- a/Documentation/arm64/pointer-authentication.txt +++ b/Documentation/arm64/pointer-authentication.rst @@ -1,7 +1,9 @@ +======================================= Pointer authentication in AArch64 Linux ======================================= Author: Mark Rutland + Date: 2017-07-19 This document briefly describes the provision of pointer authentication diff --git a/Documentation/arm64/silicon-errata.txt b/Documentation/arm64/silicon-errata.rst similarity index 55% rename from Documentation/arm64/silicon-errata.txt rename to Documentation/arm64/silicon-errata.rst index 2735462d5958..c792774be59e 100644 --- a/Documentation/arm64/silicon-errata.txt +++ b/Documentation/arm64/silicon-errata.rst @@ -1,7 +1,9 @@ - Silicon Errata and Software Workarounds - ======================================= +======================================= +Silicon Errata and Software Workarounds +======================================= Author: Will Deacon + Date : 27 November 2015 It is an unfortunate fact of life that hardware is often produced with @@ -9,11 +11,13 @@ so-called "errata", which can cause it to deviate from the architecture under specific circumstances. For hardware produced by ARM, these errata are broadly classified into the following categories: - Category A: A critical error without a viable workaround. - Category B: A significant or critical error with an acceptable + ========== ======================================================== + Category A A critical error without a viable workaround. + Category B A significant or critical error with an acceptable workaround. - Category C: A minor error that is not expected to occur under normal + Category C A minor error that is not expected to occur under normal operation. + ========== ======================================================== For more information, consult one of the "Software Developers Errata Notice" documents available on infocenter.arm.com (registration @@ -42,47 +46,86 @@ file acts as a registry of software workarounds in the Linux Kernel and will be updated when new workarounds are committed and backported to stable kernels. -| Implementor | Component | Erratum ID | Kconfig | +----------------+-----------------+-----------------+-----------------------------+ +| Implementor | Component | Erratum ID | Kconfig | ++================+=================+=================+=============================+ | Allwinner | A64/R18 | UNKNOWN1 | SUN50I_ERRATUM_UNKNOWN1 | -| | | | | ++----------------+-----------------+-----------------+-----------------------------+ ++----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-A53 | #826319 | ARM64_ERRATUM_826319 | ++----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-A53 | #827319 | ARM64_ERRATUM_827319 | ++----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-A53 | #824069 | ARM64_ERRATUM_824069 | ++----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-A53 | #819472 | ARM64_ERRATUM_819472 | ++----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-A53 | #845719 | ARM64_ERRATUM_845719 | ++----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-A53 | #843419 | ARM64_ERRATUM_843419 | ++----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-A57 | #832075 | ARM64_ERRATUM_832075 | ++----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-A57 | #852523 | N/A | ++----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-A57 | #834220 | ARM64_ERRATUM_834220 | ++----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-A72 | #853709 | N/A | ++----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-A73 | #858921 | ARM64_ERRATUM_858921 | ++----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-A55 | #1024718 | ARM64_ERRATUM_1024718 | ++----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-A76 | #1188873,1418040| ARM64_ERRATUM_1418040 | ++----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-A76 | #1165522 | ARM64_ERRATUM_1165522 | ++----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-A76 | #1286807 | ARM64_ERRATUM_1286807 | ++----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-A76 | #1463225 | ARM64_ERRATUM_1463225 | ++----------------+-----------------+-----------------+-----------------------------+ | ARM | Neoverse-N1 | #1188873,1418040| ARM64_ERRATUM_1418040 | ++----------------+-----------------+-----------------+-----------------------------+ | ARM | MMU-500 | #841119,826419 | N/A | -| | | | | ++----------------+-----------------+-----------------+-----------------------------+ ++----------------+-----------------+-----------------+-----------------------------+ | Cavium | ThunderX ITS | #22375,24313 | CAVIUM_ERRATUM_22375 | ++----------------+-----------------+-----------------+-----------------------------+ | Cavium | ThunderX ITS | #23144 | CAVIUM_ERRATUM_23144 | ++----------------+-----------------+-----------------+-----------------------------+ | Cavium | ThunderX GICv3 | #23154 | CAVIUM_ERRATUM_23154 | ++----------------+-----------------+-----------------+-----------------------------+ | Cavium | ThunderX Core | #27456 | CAVIUM_ERRATUM_27456 | ++----------------+-----------------+-----------------+-----------------------------+ | Cavium | ThunderX Core | #30115 | CAVIUM_ERRATUM_30115 | ++----------------+-----------------+-----------------+-----------------------------+ | Cavium | ThunderX SMMUv2 | #27704 | N/A | ++----------------+-----------------+-----------------+-----------------------------+ | Cavium | ThunderX2 SMMUv3| #74 | N/A | ++----------------+-----------------+-----------------+-----------------------------+ | Cavium | ThunderX2 SMMUv3| #126 | N/A | -| | | | | ++----------------+-----------------+-----------------+-----------------------------+ ++----------------+-----------------+-----------------+-----------------------------+ | Freescale/NXP | LS2080A/LS1043A | A-008585 | FSL_ERRATUM_A008585 | -| | | | | ++----------------+-----------------+-----------------+-----------------------------+ ++----------------+-----------------+-----------------+-----------------------------+ | Hisilicon | Hip0{5,6,7} | #161010101 | HISILICON_ERRATUM_161010101 | ++----------------+-----------------+-----------------+-----------------------------+ | Hisilicon | Hip0{6,7} | #161010701 | N/A | ++----------------+-----------------+-----------------+-----------------------------+ | Hisilicon | Hip07 | #161600802 | HISILICON_ERRATUM_161600802 | ++----------------+-----------------+-----------------+-----------------------------+ | Hisilicon | Hip08 SMMU PMCG | #162001800 | N/A | -| | | | | ++----------------+-----------------+-----------------+-----------------------------+ ++----------------+-----------------+-----------------+-----------------------------+ | Qualcomm Tech. | Kryo/Falkor v1 | E1003 | QCOM_FALKOR_ERRATUM_1003 | ++----------------+-----------------+-----------------+-----------------------------+ | Qualcomm Tech. | Falkor v1 | E1009 | QCOM_FALKOR_ERRATUM_1009 | ++----------------+-----------------+-----------------+-----------------------------+ | Qualcomm Tech. | QDF2400 ITS | E0065 | QCOM_QDF2400_ERRATUM_0065 | ++----------------+-----------------+-----------------+-----------------------------+ | Qualcomm Tech. | Falkor v{1,2} | E1041 | QCOM_FALKOR_ERRATUM_1041 | ++----------------+-----------------+-----------------+-----------------------------+ ++----------------+-----------------+-----------------+-----------------------------+ | Fujitsu | A64FX | E#010001 | FUJITSU_ERRATUM_010001 | ++----------------+-----------------+-----------------+-----------------------------+ diff --git a/Documentation/arm64/sve.txt b/Documentation/arm64/sve.rst similarity index 98% rename from Documentation/arm64/sve.txt rename to Documentation/arm64/sve.rst index 9940e924a47e..38422ab249dd 100644 --- a/Documentation/arm64/sve.txt +++ b/Documentation/arm64/sve.rst @@ -1,7 +1,9 @@ - Scalable Vector Extension support for AArch64 Linux - =================================================== +=================================================== +Scalable Vector Extension support for AArch64 Linux +=================================================== Author: Dave Martin + Date: 4 August 2017 This document outlines briefly the interface provided to userspace by Linux in @@ -426,7 +428,7 @@ In A64 state, SVE adds the following: * FPSR and FPCR are retained from ARMv8-A, and interact with SVE floating-point operations in a similar way to the way in which they interact with ARMv8 - floating-point operations. + floating-point operations:: 8VL-1 128 0 bit index +---- //// -----------------+ @@ -483,6 +485,8 @@ ARMv8-A defines the following floating-point / SIMD register state: * 32 128-bit vector registers V0..V31 * 2 32-bit status/control registers FPSR, FPCR +:: + 127 0 bit index +---------------+ V0 | | @@ -517,7 +521,7 @@ References [2] arch/arm64/include/uapi/asm/ptrace.h AArch64 Linux ptrace ABI definitions -[3] Documentation/arm64/cpu-feature-registers.txt +[3] Documentation/arm64/cpu-feature-registers.rst [4] ARM IHI0055C http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055c/IHI0055C_beta_aapcs64.pdf diff --git a/Documentation/arm64/tagged-pointers.txt b/Documentation/arm64/tagged-pointers.rst similarity index 94% rename from Documentation/arm64/tagged-pointers.txt rename to Documentation/arm64/tagged-pointers.rst index a25a99e82bb1..2acdec3ebbeb 100644 --- a/Documentation/arm64/tagged-pointers.txt +++ b/Documentation/arm64/tagged-pointers.rst @@ -1,7 +1,9 @@ - Tagged virtual addresses in AArch64 Linux - ========================================= +========================================= +Tagged virtual addresses in AArch64 Linux +========================================= Author: Will Deacon + Date : 12 June 2013 This document briefly describes the provision of tagged virtual diff --git a/Documentation/translations/zh_CN/arm64/booting.txt b/Documentation/translations/zh_CN/arm64/booting.txt index c1dd968c5ee9..3bfbf66e5a5e 100644 --- a/Documentation/translations/zh_CN/arm64/booting.txt +++ b/Documentation/translations/zh_CN/arm64/booting.txt @@ -1,4 +1,4 @@ -Chinese translated version of Documentation/arm64/booting.txt +Chinese translated version of Documentation/arm64/booting.rst If you have any comment or update to the content, please contact the original document maintainer directly. However, if you have a problem @@ -10,7 +10,7 @@ M: Will Deacon zh_CN: Fu Wei C: 55f058e7574c3615dea4615573a19bdb258696c6 --------------------------------------------------------------------- -Documentation/arm64/booting.txt 的中文翻译 +Documentation/arm64/booting.rst 的中文翻译 如果想评论或更新本文的内容,请直接联系原文档的维护者。如果你使用英文 交流有困难的话,也可以向中文版维护者求助。如果本翻译更新不及时或者翻 diff --git a/Documentation/translations/zh_CN/arm64/legacy_instructions.txt b/Documentation/translations/zh_CN/arm64/legacy_instructions.txt index 68362a1ab717..e295cf75f606 100644 --- a/Documentation/translations/zh_CN/arm64/legacy_instructions.txt +++ b/Documentation/translations/zh_CN/arm64/legacy_instructions.txt @@ -1,4 +1,4 @@ -Chinese translated version of Documentation/arm64/legacy_instructions.txt +Chinese translated version of Documentation/arm64/legacy_instructions.rst If you have any comment or update to the content, please contact the original document maintainer directly. However, if you have a problem @@ -10,7 +10,7 @@ Maintainer: Punit Agrawal Suzuki K. Poulose Chinese maintainer: Fu Wei --------------------------------------------------------------------- -Documentation/arm64/legacy_instructions.txt 的中文翻译 +Documentation/arm64/legacy_instructions.rst 的中文翻译 如果想评论或更新本文的内容,请直接联系原文档的维护者。如果你使用英文 交流有困难的话,也可以向中文版维护者求助。如果本翻译更新不及时或者翻 diff --git a/Documentation/translations/zh_CN/arm64/memory.txt b/Documentation/translations/zh_CN/arm64/memory.txt index 19b3a52d5d94..be20f8228b91 100644 --- a/Documentation/translations/zh_CN/arm64/memory.txt +++ b/Documentation/translations/zh_CN/arm64/memory.txt @@ -1,4 +1,4 @@ -Chinese translated version of Documentation/arm64/memory.txt +Chinese translated version of Documentation/arm64/memory.rst If you have any comment or update to the content, please contact the original document maintainer directly. However, if you have a problem @@ -9,7 +9,7 @@ or if there is a problem with the translation. Maintainer: Catalin Marinas Chinese maintainer: Fu Wei --------------------------------------------------------------------- -Documentation/arm64/memory.txt 的中文翻译 +Documentation/arm64/memory.rst 的中文翻译 如果想评论或更新本文的内容,请直接联系原文档的维护者。如果你使用英文 交流有困难的话,也可以向中文版维护者求助。如果本翻译更新不及时或者翻 diff --git a/Documentation/translations/zh_CN/arm64/silicon-errata.txt b/Documentation/translations/zh_CN/arm64/silicon-errata.txt index 39477c75c4a4..440c59ac7dce 100644 --- a/Documentation/translations/zh_CN/arm64/silicon-errata.txt +++ b/Documentation/translations/zh_CN/arm64/silicon-errata.txt @@ -1,4 +1,4 @@ -Chinese translated version of Documentation/arm64/silicon-errata.txt +Chinese translated version of Documentation/arm64/silicon-errata.rst If you have any comment or update to the content, please contact the original document maintainer directly. However, if you have a problem @@ -10,7 +10,7 @@ M: Will Deacon zh_CN: Fu Wei C: 1926e54f115725a9248d0c4c65c22acaf94de4c4 --------------------------------------------------------------------- -Documentation/arm64/silicon-errata.txt 的中文翻译 +Documentation/arm64/silicon-errata.rst 的中文翻译 如果想评论或更新本文的内容,请直接联系原文档的维护者。如果你使用英文 交流有困难的话,也可以向中文版维护者求助。如果本翻译更新不及时或者翻 diff --git a/Documentation/translations/zh_CN/arm64/tagged-pointers.txt b/Documentation/translations/zh_CN/arm64/tagged-pointers.txt index 2664d1bd5a1c..77ac3548a16d 100644 --- a/Documentation/translations/zh_CN/arm64/tagged-pointers.txt +++ b/Documentation/translations/zh_CN/arm64/tagged-pointers.txt @@ -1,4 +1,4 @@ -Chinese translated version of Documentation/arm64/tagged-pointers.txt +Chinese translated version of Documentation/arm64/tagged-pointers.rst If you have any comment or update to the content, please contact the original document maintainer directly. However, if you have a problem @@ -9,7 +9,7 @@ or if there is a problem with the translation. Maintainer: Will Deacon Chinese maintainer: Fu Wei --------------------------------------------------------------------- -Documentation/arm64/tagged-pointers.txt 的中文翻译 +Documentation/arm64/tagged-pointers.rst 的中文翻译 如果想评论或更新本文的内容,请直接联系原文档的维护者。如果你使用英文 交流有困难的话,也可以向中文版维护者求助。如果本翻译更新不及时或者翻 diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index ba6c42c576dd..68984c284c40 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -2205,7 +2205,7 @@ max_vq. This is the maximum vector length available to the guest on this vcpu, and determines which register slices are visible through this ioctl interface. -(See Documentation/arm64/sve.txt for an explanation of the "vq" +(See Documentation/arm64/sve.rst for an explanation of the "vq" nomenclature.) KVM_REG_ARM64_SVE_VLS is only accessible after KVM_ARM_VCPU_INIT. diff --git a/arch/arm64/include/asm/efi.h b/arch/arm64/include/asm/efi.h index c9e9a6978e73..8e79ce9c3f5c 100644 --- a/arch/arm64/include/asm/efi.h +++ b/arch/arm64/include/asm/efi.h @@ -83,7 +83,7 @@ static inline unsigned long efi_get_max_fdt_addr(unsigned long dram_base) * guaranteed to cover the kernel Image. * * Since the EFI stub is part of the kernel Image, we can relax the - * usual requirements in Documentation/arm64/booting.txt, which still + * usual requirements in Documentation/arm64/booting.rst, which still * apply to other bootloaders, and are required for some kernel * configurations. */ diff --git a/arch/arm64/include/asm/image.h b/arch/arm64/include/asm/image.h index e2c27a2278e9..c2b13213c720 100644 --- a/arch/arm64/include/asm/image.h +++ b/arch/arm64/include/asm/image.h @@ -27,7 +27,7 @@ /* * struct arm64_image_header - arm64 kernel image header - * See Documentation/arm64/booting.txt for details + * See Documentation/arm64/booting.rst for details * * @code0: Executable code, or * @mz_header alternatively used for part of MZ header diff --git a/arch/arm64/include/uapi/asm/sigcontext.h b/arch/arm64/include/uapi/asm/sigcontext.h index 5f3c0cec5af9..a61f89ddbf34 100644 --- a/arch/arm64/include/uapi/asm/sigcontext.h +++ b/arch/arm64/include/uapi/asm/sigcontext.h @@ -137,7 +137,7 @@ struct sve_context { * vector length beyond its initial architectural limit of 2048 bits * (16 quadwords). * - * See linux/Documentation/arm64/sve.txt for a description of the VL/VQ + * See linux/Documentation/arm64/sve.rst for a description of the VL/VQ * terminology. */ #define SVE_VQ_BYTES __SVE_VQ_BYTES /* bytes per quadword */ diff --git a/arch/arm64/kernel/kexec_image.c b/arch/arm64/kernel/kexec_image.c index 31cc2f423aa8..2514fd6f12cb 100644 --- a/arch/arm64/kernel/kexec_image.c +++ b/arch/arm64/kernel/kexec_image.c @@ -53,7 +53,7 @@ static void *image_load(struct kimage *image, /* * We require a kernel with an unambiguous Image header. Per - * Documentation/arm64/booting.txt, this is the case when image_size + * Documentation/arm64/booting.rst, this is the case when image_size * is non-zero (practically speaking, since v3.17). */ h = (struct arm64_image_header *)kernel; From e327cfcb25422c91f4bb8e8a3488386ac95955f1 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Wed, 12 Jun 2019 14:52:39 -0300 Subject: [PATCH 070/129] docs: cdrom-standard.tex: convert from LaTeX to ReST This is the only LaTeX documentation file inside the documentation. Instead of having a Latex document directly there, convert it to ReST format, as this is the format we're using for docs. For now, let's keep the extension as .txt in order to avoid warnings when building the documentation with Sphinx. The next patch patch will rename it to .rst and add it to the building system. Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Jonathan Corbet --- Documentation/cdrom/Makefile | 21 - Documentation/cdrom/cdrom-standard.tex | 1026 ----------------------- Documentation/cdrom/cdrom-standard.txt | 1063 ++++++++++++++++++++++++ drivers/cdrom/cdrom.c | 2 +- 4 files changed, 1064 insertions(+), 1048 deletions(-) delete mode 100644 Documentation/cdrom/Makefile delete mode 100644 Documentation/cdrom/cdrom-standard.tex create mode 100644 Documentation/cdrom/cdrom-standard.txt diff --git a/Documentation/cdrom/Makefile b/Documentation/cdrom/Makefile deleted file mode 100644 index a19e321928e1..000000000000 --- a/Documentation/cdrom/Makefile +++ /dev/null @@ -1,21 +0,0 @@ -LATEXFILE = cdrom-standard - -all: - make clean - latex $(LATEXFILE) - latex $(LATEXFILE) - @if [ -x `which gv` ]; then \ - `dvips -q -t letter -o $(LATEXFILE).ps $(LATEXFILE).dvi` ;\ - `gv -antialias -media letter -nocenter $(LATEXFILE).ps` ;\ - else \ - `xdvi $(LATEXFILE).dvi &` ;\ - fi - make sortofclean - -clean: - rm -f $(LATEXFILE).ps $(LATEXFILE).dvi $(LATEXFILE).aux $(LATEXFILE).log - -sortofclean: - rm -f $(LATEXFILE).aux $(LATEXFILE).log - - diff --git a/Documentation/cdrom/cdrom-standard.tex b/Documentation/cdrom/cdrom-standard.tex deleted file mode 100644 index f7cd455973f7..000000000000 --- a/Documentation/cdrom/cdrom-standard.tex +++ /dev/null @@ -1,1026 +0,0 @@ -\documentclass{article} -\def\version{$Id: cdrom-standard.tex,v 1.9 1997/12/28 15:42:49 david Exp $} -\newcommand{\newsection}[1]{\newpage\section{#1}} - -\evensidemargin=0pt -\oddsidemargin=0pt -\topmargin=-\headheight \advance\topmargin by -\headsep -\textwidth=15.99cm \textheight=24.62cm % normal A4, 1'' margin - -\def\linux{{\sc Linux}} -\def\cdrom{{\sc cd-rom}} -\def\UCD{{\sc Uniform cd-rom Driver}} -\def\cdromc{{\tt {cdrom.c}}} -\def\cdromh{{\tt {cdrom.h}}} -\def\fo{\sl} % foreign words -\def\ie{{\fo i.e.}} -\def\eg{{\fo e.g.}} - -\everymath{\it} \everydisplay{\it} -\catcode `\_=\active \def_{\_\penalty100 } -\catcode`\<=\active \def<#1>{{\langle\hbox{\rm#1}\rangle}} - -\begin{document} -\title{A \linux\ \cdrom\ standard} -\author{David van Leeuwen\\{\normalsize\tt david@ElseWare.cistron.nl} -\\{\footnotesize updated by Erik Andersen {\tt(andersee@debian.org)}} -\\{\footnotesize updated by Jens Axboe {\tt(axboe@image.dk)}}} -\date{12 March 1999} - -\maketitle - -\newsection{Introduction} - -\linux\ is probably the Unix-like operating system that supports -the widest variety of hardware devices. The reasons for this are -presumably -\begin{itemize} -\item - The large list of hardware devices available for the many platforms - that \linux\ now supports (\ie, i386-PCs, Sparc Suns, etc.) -\item - The open design of the operating system, such that anybody can write a - driver for \linux. -\item - There is plenty of source code around as examples of how to write a driver. -\end{itemize} -The openness of \linux, and the many different types of available -hardware has allowed \linux\ to support many different hardware devices. -Unfortunately, the very openness that has allowed \linux\ to support -all these different devices has also allowed the behavior of each -device driver to differ significantly from one device to another. -This divergence of behavior has been very significant for \cdrom\ -devices; the way a particular drive reacts to a `standard' $ioctl()$ -call varies greatly from one device driver to another. To avoid making -their drivers totally inconsistent, the writers of \linux\ \cdrom\ -drivers generally created new device drivers by understanding, copying, -and then changing an existing one. Unfortunately, this practice did not -maintain uniform behavior across all the \linux\ \cdrom\ drivers. - -This document describes an effort to establish Uniform behavior across -all the different \cdrom\ device drivers for \linux. This document also -defines the various $ioctl$s, and how the low-level \cdrom\ device -drivers should implement them. Currently (as of the \linux\ 2.1.$x$ -development kernels) several low-level \cdrom\ device drivers, including -both IDE/ATAPI and SCSI, now use this Uniform interface. - -When the \cdrom\ was developed, the interface between the \cdrom\ drive -and the computer was not specified in the standards. As a result, many -different \cdrom\ interfaces were developed. Some of them had their -own proprietary design (Sony, Mitsumi, Panasonic, Philips), other -manufacturers adopted an existing electrical interface and changed -the functionality (CreativeLabs/SoundBlaster, Teac, Funai) or simply -adapted their drives to one or more of the already existing electrical -interfaces (Aztech, Sanyo, Funai, Vertos, Longshine, Optics Storage and -most of the `NoName' manufacturers). In cases where a new drive really -brought its own interface or used its own command set and flow control -scheme, either a separate driver had to be written, or an existing -driver had to be enhanced. History has delivered us \cdrom\ support for -many of these different interfaces. Nowadays, almost all new \cdrom\ -drives are either IDE/ATAPI or SCSI, and it is very unlikely that any -manufacturer will create a new interface. Even finding drives for the -old proprietary interfaces is getting difficult. - -When (in the 1.3.70's) I looked at the existing software interface, -which was expressed through \cdromh, it appeared to be a rather wild -set of commands and data formats.\footnote{I cannot recollect what -kernel version I looked at, then, presumably 1.2.13 and 1.3.34---the -latest kernel that I was indirectly involved in.} It seemed that many -features of the software interface had been added to accommodate the -capabilities of a particular drive, in an {\fo ad hoc\/} manner. More -importantly, it appeared that the behavior of the `standard' commands -was different for most of the different drivers: \eg, some drivers -close the tray if an $open()$ call occurs when the tray is open, while -others do not. Some drivers lock the door upon opening the device, to -prevent an incoherent file system, but others don't, to allow software -ejection. Undoubtedly, the capabilities of the different drives vary, -but even when two drives have the same capability their drivers' -behavior was usually different. - -I decided to start a discussion on how to make all the \linux\ \cdrom\ -drivers behave more uniformly. I began by contacting the developers of -the many \cdrom\ drivers found in the \linux\ kernel. Their reactions -encouraged me to write the \UCD\ which this document is intended to -describe. The implementation of the \UCD\ is in the file \cdromc. This -driver is intended to be an additional software layer that sits on top -of the low-level device drivers for each \cdrom\ drive. By adding this -additional layer, it is possible to have all the different \cdrom\ -devices behave {\em exactly\/} the same (insofar as the underlying -hardware will allow). - -The goal of the \UCD\ is {\em not\/} to alienate driver developers who -have not yet taken steps to support this effort. The goal of \UCD\ is -simply to give people writing application programs for \cdrom\ drives -{\em one\/} \linux\ \cdrom\ interface with consistent behavior for all -\cdrom\ devices. In addition, this also provides a consistent interface -between the low-level device driver code and the \linux\ kernel. Care -is taken that 100\,\% compatibility exists with the data structures and -programmer's interface defined in \cdromh. This guide was written to -help \cdrom\ driver developers adapt their code to use the \UCD\ code -defined in \cdromc. - -Personally, I think that the most important hardware interfaces are -the IDE/ATAPI drives and, of course, the SCSI drives, but as prices -of hardware drop continuously, it is also likely that people may have -more than one \cdrom\ drive, possibly of mixed types. It is important -that these drives behave in the same way. In December 1994, one of the -cheapest \cdrom\ drives was a Philips cm206, a double-speed proprietary -drive. In the months that I was busy writing a \linux\ driver for it, -proprietary drives became obsolete and IDE/ATAPI drives became the -standard. At the time of the last update to this document (November -1997) it is becoming difficult to even {\em find} anything less than a -16 speed \cdrom\ drive, and 24 speed drives are common. - -\newsection{Standardizing through another software level} -\label{cdrom.c} - -At the time this document was conceived, all drivers directly -implemented the \cdrom\ $ioctl()$ calls through their own routines. This -led to the danger of different drivers forgetting to do important things -like checking that the user was giving the driver valid data. More -importantly, this led to the divergence of behavior, which has already -been discussed. - -For this reason, the \UCD\ was created to enforce consistent \cdrom\ -drive behavior, and to provide a common set of services to the various -low-level \cdrom\ device drivers. The \UCD\ now provides another -software-level, that separates the $ioctl()$ and $open()$ implementation -from the actual hardware implementation. Note that this effort has -made few changes which will affect a user's application programs. The -greatest change involved moving the contents of the various low-level -\cdrom\ drivers' header files to the kernel's cdrom directory. This was -done to help ensure that the user is only presented with only one cdrom -interface, the interface defined in \cdromh. - -\cdrom\ drives are specific enough (\ie, different from other -block-devices such as floppy or hard disc drives), to define a set -of common {\em \cdrom\ device operations}, $_dops$. -These operations are different from the classical block-device file -operations, $_fops$. - -The routines for the \UCD\ interface level are implemented in the file -\cdromc. In this file, the \UCD\ interfaces with the kernel as a block -device by registering the following general $struct\ file_operations$: -$$ -\halign{$#$\ \hfil&$#$\ \hfil&$/*$ \rm# $*/$\hfil\cr -struct& file_operations\ cdrom_fops = \{\hidewidth\cr - &NULL, & lseek \cr - &block_read, & read---general block-dev read \cr - &block_write, & write---general block-dev write \cr - &NULL, & readdir \cr - &NULL, & select \cr - &cdrom_ioctl, & ioctl \cr - &NULL, & mmap \cr - &cdrom_open, & open \cr - &cdrom_release, & release \cr - &NULL, & fsync \cr - &NULL, & fasync \cr - &cdrom_media_changed, & media change \cr - &NULL & revalidate \cr -\};\cr -} -$$ - -Every active \cdrom\ device shares this $struct$. The routines -declared above are all implemented in \cdromc, since this file is the -place where the behavior of all \cdrom-devices is defined and -standardized. The actual interface to the various types of \cdrom\ -hardware is still performed by various low-level \cdrom-device -drivers. These routines simply implement certain {\em capabilities\/} -that are common to all \cdrom\ (and really, all removable-media -devices). - -Registration of a low-level \cdrom\ device driver is now done through -the general routines in \cdromc, not through the Virtual File System -(VFS) any more. The interface implemented in \cdromc\ is carried out -through two general structures that contain information about the -capabilities of the driver, and the specific drives on which the -driver operates. The structures are: -\begin{description} -\item[$cdrom_device_ops$] - This structure contains information about the low-level driver for a - \cdrom\ device. This structure is conceptually connected to the major - number of the device (although some drivers may have different - major numbers, as is the case for the IDE driver). -\item[$cdrom_device_info$] - This structure contains information about a particular \cdrom\ drive, - such as its device name, speed, etc. This structure is conceptually - connected to the minor number of the device. -\end{description} - -Registering a particular \cdrom\ drive with the \UCD\ is done by the -low-level device driver though a call to: -$$register_cdrom(struct\ cdrom_device_info * _info) -$$ -The device information structure, $_info$, contains all the -information needed for the kernel to interface with the low-level -\cdrom\ device driver. One of the most important entries in this -structure is a pointer to the $cdrom_device_ops$ structure of the -low-level driver. - -The device operations structure, $cdrom_device_ops$, contains a list -of pointers to the functions which are implemented in the low-level -device driver. When \cdromc\ accesses a \cdrom\ device, it does it -through the functions in this structure. It is impossible to know all -the capabilities of future \cdrom\ drives, so it is expected that this -list may need to be expanded from time to time as new technologies are -developed. For example, CD-R and CD-R/W drives are beginning to become -popular, and support will soon need to be added for them. For now, the -current $struct$ is: -$$ -\halign{$#$\ \hfil&$#$\ \hfil&\hbox to 10em{$#$\hss}& - $/*$ \rm# $*/$\hfil\cr -struct& cdrom_device_ops\ \{ \hidewidth\cr - &int& (* open)(struct\ cdrom_device_info *, int)\cr - &void& (* release)(struct\ cdrom_device_info *);\cr - &int& (* drive_status)(struct\ cdrom_device_info *, int);\cr - &unsigned\ int& (* check_events)(struct\ cdrom_device_info *, unsigned\ int, int);\cr - &int& (* media_changed)(struct\ cdrom_device_info *, int);\cr - &int& (* tray_move)(struct\ cdrom_device_info *, int);\cr - &int& (* lock_door)(struct\ cdrom_device_info *, int);\cr - &int& (* select_speed)(struct\ cdrom_device_info *, int);\cr - &int& (* select_disc)(struct\ cdrom_device_info *, int);\cr - &int& (* get_last_session) (struct\ cdrom_device_info *, - struct\ cdrom_multisession *{});\cr - &int& (* get_mcn)(struct\ cdrom_device_info *, struct\ cdrom_mcn *{});\cr - &int& (* reset)(struct\ cdrom_device_info *);\cr - &int& (* audio_ioctl)(struct\ cdrom_device_info *, unsigned\ int, - void *{});\cr -\noalign{\medskip} - &const\ int& capability;& capability flags \cr - &int& (* generic_packet)(struct\ cdrom_device_info *, struct\ packet_command *{});\cr -\};\cr -} -$$ -When a low-level device driver implements one of these capabilities, -it should add a function pointer to this $struct$. When a particular -function is not implemented, however, this $struct$ should contain a -NULL instead. The $capability$ flags specify the capabilities of the -\cdrom\ hardware and/or low-level \cdrom\ driver when a \cdrom\ drive -is registered with the \UCD. - -Note that most functions have fewer parameters than their -$blkdev_fops$ counterparts. This is because very little of the -information in the structures $inode$ and $file$ is used. For most -drivers, the main parameter is the $struct$ $cdrom_device_info$, from -which the major and minor number can be extracted. (Most low-level -\cdrom\ drivers don't even look at the major and minor number though, -since many of them only support one device.) This will be available -through $dev$ in $cdrom_device_info$ described below. - -The drive-specific, minor-like information that is registered with -\cdromc, currently contains the following fields: -$$ -\halign{$#$\ \hfil&$#$\ \hfil&\hbox to 10em{$#$\hss}& - $/*$ \rm# $*/$\hfil\cr -struct& cdrom_device_info\ \{ \hidewidth\cr - & const\ struct\ cdrom_device_ops *& ops;& device operations for this major\cr - & struct\ list_head& list;& linked list of all device_info\cr - & struct\ gendisk *& disk;& matching block layer disk\cr - & void *& handle;& driver-dependent data\cr -\noalign{\medskip} - & int& mask;& mask of capability: disables them \cr - & int& speed;& maximum speed for reading data \cr - & int& capacity;& number of discs in a jukebox \cr -\noalign{\medskip} - &unsigned\ int& options : 30;& options flags \cr - &unsigned& mc_flags : 2;& media-change buffer flags \cr - &unsigned\ int& vfs_events;& cached events for vfs path\cr - &unsigned\ int& ioctl_events;& cached events for ioctl path\cr - & int& use_count;& number of times device is opened\cr - & char& name[20];& name of the device type\cr -\noalign{\medskip} - &__u8& sanyo_slot : 2;& Sanyo 3-CD changer support\cr - &__u8& keeplocked : 1;& CDROM_LOCKDOOR status\cr - &__u8& reserved : 5;& not used yet\cr - & int& cdda_method;& see CDDA_* flags\cr - &__u8& last_sense;& saves last sense key\cr - &__u8& media_written;& dirty flag, DVD+RW bookkeeping\cr - &unsigned\ short& mmc3_profile;& current MMC3 profile\cr - & int& for_data;& unknown:TBD\cr - & int\ (* exit)\ (struct\ cdrom_device_info *);&& unknown:TBD\cr - & int& mrw_mode_page;& which MRW mode page is in use\cr -\}\cr -}$$ -Using this $struct$, a linked list of the registered minor devices is -built, using the $next$ field. The device number, the device operations -struct and specifications of properties of the drive are stored in this -structure. - -The $mask$ flags can be used to mask out some of the capabilities listed -in $ops\to capability$, if a specific drive doesn't support a feature -of the driver. The value $speed$ specifies the maximum head-rate of the -drive, measured in units of normal audio speed (176\,kB/sec raw data or -150\,kB/sec file system data). The parameters are declared $const$ -because they describe properties of the drive, which don't change after -registration. - -A few registers contain variables local to the \cdrom\ drive. The -flags $options$ are used to specify how the general \cdrom\ routines -should behave. These various flags registers should provide enough -flexibility to adapt to the different users' wishes (and {\em not\/} the -`arbitrary' wishes of the author of the low-level device driver, as is -the case in the old scheme). The register $mc_flags$ is used to buffer -the information from $media_changed()$ to two separate queues. Other -data that is specific to a minor drive, can be accessed through $handle$, -which can point to a data structure specific to the low-level driver. -The fields $use_count$, $next$, $options$ and $mc_flags$ need not be -initialized. - -The intermediate software layer that \cdromc\ forms will perform some -additional bookkeeping. The use count of the device (the number of -processes that have the device opened) is registered in $use_count$. The -function $cdrom_ioctl()$ will verify the appropriate user-memory regions -for read and write, and in case a location on the CD is transferred, -it will `sanitize' the format by making requests to the low-level -drivers in a standard format, and translating all formats between the -user-software and low level drivers. This relieves much of the drivers' -memory checking and format checking and translation. Also, the necessary -structures will be declared on the program stack. - -The implementation of the functions should be as defined in the -following sections. Two functions {\em must\/} be implemented, namely -$open()$ and $release()$. Other functions may be omitted, their -corresponding capability flags will be cleared upon registration. -Generally, a function returns zero on success and negative on error. A -function call should return only after the command has completed, but of -course waiting for the device should not use processor time. - -\subsection{$Int\ open(struct\ cdrom_device_info * cdi, int\ purpose)$} - -$Open()$ should try to open the device for a specific $purpose$, which -can be either: -\begin{itemize} -\item[0] Open for reading data, as done by {\tt {mount()}} (2), or the -user commands {\tt {dd}} or {\tt {cat}}. -\item[1] Open for $ioctl$ commands, as done by audio-CD playing -programs. -\end{itemize} -Notice that any strategic code (closing tray upon $open()$, etc.)\ is -done by the calling routine in \cdromc, so the low-level routine -should only be concerned with proper initialization, such as spinning -up the disc, etc. % and device-use count - - -\subsection{$Void\ release(struct\ cdrom_device_info * cdi)$} - - -Device-specific actions should be taken such as spinning down the device. -However, strategic actions such as ejection of the tray, or unlocking -the door, should be left over to the general routine $cdrom_release()$. -This is the only function returning type $void$. - -\subsection{$Int\ drive_status(struct\ cdrom_device_info * cdi, int\ slot_nr)$} -\label{drive status} - -The function $drive_status$, if implemented, should provide -information on the status of the drive (not the status of the disc, -which may or may not be in the drive). If the drive is not a changer, -$slot_nr$ should be ignored. In \cdromh\ the possibilities are listed: -$$ -\halign{$#$\ \hfil&$/*$ \rm# $*/$\hfil\cr -CDS_NO_INFO& no information available\cr -CDS_NO_DISC& no disc is inserted, tray is closed\cr -CDS_TRAY_OPEN& tray is opened\cr -CDS_DRIVE_NOT_READY& something is wrong, tray is moving?\cr -CDS_DISC_OK& a disc is loaded and everything is fine\cr -} -$$ - -\subsection{$Int\ media_changed(struct\ cdrom_device_info * cdi, int\ disc_nr)$} - -This function is very similar to the original function in $struct\ -file_operations$. It returns 1 if the medium of the device $cdi\to -dev$ has changed since the last call, and 0 otherwise. The parameter -$disc_nr$ identifies a specific slot in a juke-box, it should be -ignored for single-disc drives. Note that by `re-routing' this -function through $cdrom_media_changed()$, we can implement separate -queues for the VFS and a new $ioctl()$ function that can report device -changes to software (\eg, an auto-mounting daemon). - -\subsection{$Int\ tray_move(struct\ cdrom_device_info * cdi, int\ position)$} - -This function, if implemented, should control the tray movement. (No -other function should control this.) The parameter $position$ controls -the desired direction of movement: -\begin{itemize} -\item[0] Close tray -\item[1] Open tray -\end{itemize} -This function returns 0 upon success, and a non-zero value upon -error. Note that if the tray is already in the desired position, no -action need be taken, and the return value should be 0. - -\subsection{$Int\ lock_door(struct\ cdrom_device_info * cdi, int\ lock)$} - -This function (and no other code) controls locking of the door, if the -drive allows this. The value of $lock$ controls the desired locking -state: -\begin{itemize} -\item[0] Unlock door, manual opening is allowed -\item[1] Lock door, tray cannot be ejected manually -\end{itemize} -This function returns 0 upon success, and a non-zero value upon -error. Note that if the door is already in the requested state, no -action need be taken, and the return value should be 0. - -\subsection{$Int\ select_speed(struct\ cdrom_device_info * cdi, int\ speed)$} - -Some \cdrom\ drives are capable of changing their head-speed. There -are several reasons for changing the speed of a \cdrom\ drive. Badly -pressed \cdrom s may benefit from less-than-maximum head rate. Modern -\cdrom\ drives can obtain very high head rates (up to $24\times$ is -common). It has been reported that these drives can make reading -errors at these high speeds, reducing the speed can prevent data loss -in these circumstances. Finally, some of these drives can -make an annoyingly loud noise, which a lower speed may reduce. %Finally, -%although the audio-low-pass filters probably aren't designed for it, -%more than real-time playback of audio might be used for high-speed -%copying of audio tracks. - -This function specifies the speed at which data is read or audio is -played back. The value of $speed$ specifies the head-speed of the -drive, measured in units of standard cdrom speed (176\,kB/sec raw data -or 150\,kB/sec file system data). So to request that a \cdrom\ drive -operate at 300\,kB/sec you would call the CDROM_SELECT_SPEED $ioctl$ -with $speed=2$. The special value `0' means `auto-selection', \ie, -maximum data-rate or real-time audio rate. If the drive doesn't have -this `auto-selection' capability, the decision should be made on the -current disc loaded and the return value should be positive. A negative -return value indicates an error. - -\subsection{$Int\ select_disc(struct\ cdrom_device_info * cdi, int\ number)$} - -If the drive can store multiple discs (a juke-box) this function -will perform disc selection. It should return the number of the -selected disc on success, a negative value on error. Currently, only -the ide-cd driver supports this functionality. - -\subsection{$Int\ get_last_session(struct\ cdrom_device_info * cdi, struct\ - cdrom_multisession * ms_info)$} - -This function should implement the old corresponding $ioctl()$. For -device $cdi\to dev$, the start of the last session of the current disc -should be returned in the pointer argument $ms_info$. Note that -routines in \cdromc\ have sanitized this argument: its requested -format will {\em always\/} be of the type $CDROM_LBA$ (linear block -addressing mode), whatever the calling software requested. But -sanitization goes even further: the low-level implementation may -return the requested information in $CDROM_MSF$ format if it wishes so -(setting the $ms_info\rightarrow addr_format$ field appropriately, of -course) and the routines in \cdromc\ will make the transformation if -necessary. The return value is 0 upon success. - -\subsection{$Int\ get_mcn(struct\ cdrom_device_info * cdi, struct\ - cdrom_mcn * mcn)$} - -Some discs carry a `Media Catalog Number' (MCN), also called -`Universal Product Code' (UPC). This number should reflect the number -that is generally found in the bar-code on the product. Unfortunately, -the few discs that carry such a number on the disc don't even use the -same format. The return argument to this function is a pointer to a -pre-declared memory region of type $struct\ cdrom_mcn$. The MCN is -expected as a 13-character string, terminated by a null-character. - -\subsection{$Int\ reset(struct\ cdrom_device_info * cdi)$} - -This call should perform a hard-reset on the drive (although in -circumstances that a hard-reset is necessary, a drive may very well not -listen to commands anymore). Preferably, control is returned to the -caller only after the drive has finished resetting. If the drive is no -longer listening, it may be wise for the underlying low-level cdrom -driver to time out. - -\subsection{$Int\ audio_ioctl(struct\ cdrom_device_info * cdi, unsigned\ - int\ cmd, void * arg)$} - -Some of the \cdrom-$ioctl$s defined in \cdromh\ can be -implemented by the routines described above, and hence the function -$cdrom_ioctl$ will use those. However, most $ioctl$s deal with -audio-control. We have decided to leave these to be accessed through a -single function, repeating the arguments $cmd$ and $arg$. Note that -the latter is of type $void*{}$, rather than $unsigned\ long\ -int$. The routine $cdrom_ioctl()$ does do some useful things, -though. It sanitizes the address format type to $CDROM_MSF$ (Minutes, -Seconds, Frames) for all audio calls. It also verifies the memory -location of $arg$, and reserves stack-memory for the argument. This -makes implementation of the $audio_ioctl()$ much simpler than in the -old driver scheme. For example, you may look up the function -$cm206_audio_ioctl()$ in {\tt {cm206.c}} that should be updated with -this documentation. - -An unimplemented ioctl should return $-ENOSYS$, but a harmless request -(\eg, $CDROMSTART$) may be ignored by returning 0 (success). Other -errors should be according to the standards, whatever they are. When -an error is returned by the low-level driver, the \UCD\ tries whenever -possible to return the error code to the calling program. (We may decide -to sanitize the return value in $cdrom_ioctl()$ though, in order to -guarantee a uniform interface to the audio-player software.) - -\subsection{$Int\ dev_ioctl(struct\ cdrom_device_info * cdi, unsigned\ int\ - cmd, unsigned\ long\ arg)$} - -Some $ioctl$s seem to be specific to certain \cdrom\ drives. That is, -they are introduced to service some capabilities of certain drives. In -fact, there are 6 different $ioctl$s for reading data, either in some -particular kind of format, or audio data. Not many drives support -reading audio tracks as data, I believe this is because of protection -of copyrights of artists. Moreover, I think that if audio-tracks are -supported, it should be done through the VFS and not via $ioctl$s. A -problem here could be the fact that audio-frames are 2352 bytes long, -so either the audio-file-system should ask for 75264 bytes at once -(the least common multiple of 512 and 2352), or the drivers should -bend their backs to cope with this incoherence (to which I would be -opposed). Furthermore, it is very difficult for the hardware to find -the exact frame boundaries, since there are no synchronization headers -in audio frames. Once these issues are resolved, this code should be -standardized in \cdromc. - -Because there are so many $ioctl$s that seem to be introduced to -satisfy certain drivers,\footnote{Is there software around that - actually uses these? I'd be interested!} any `non-standard' $ioctl$s -are routed through the call $dev_ioctl()$. In principle, `private' -$ioctl$s should be numbered after the device's major number, and not -the general \cdrom\ $ioctl$ number, {\tt {0x53}}. Currently the -non-supported $ioctl$s are: {\it CDROMREADMODE1, CDROMREADMODE2, - CDROMREADAUDIO, CDROMREADRAW, CDROMREADCOOKED, CDROMSEEK, - CDROMPLAY\-BLK and CDROM\-READALL}. - - -\subsection{\cdrom\ capabilities} -\label{capability} - -Instead of just implementing some $ioctl$ calls, the interface in -\cdromc\ supplies the possibility to indicate the {\em capabilities\/} -of a \cdrom\ drive. This can be done by ORing any number of -capability-constants that are defined in \cdromh\ at the registration -phase. Currently, the capabilities are any of: -$$ -\halign{$#$\ \hfil&$/*$ \rm# $*/$\hfil\cr -CDC_CLOSE_TRAY& can close tray by software control\cr -CDC_OPEN_TRAY& can open tray\cr -CDC_LOCK& can lock and unlock the door\cr -CDC_SELECT_SPEED& can select speed, in units of $\sim$150\,kB/s\cr -CDC_SELECT_DISC& drive is juke-box\cr -CDC_MULTI_SESSION& can read sessions $>\rm1$\cr -CDC_MCN& can read Media Catalog Number\cr -CDC_MEDIA_CHANGED& can report if disc has changed\cr -CDC_PLAY_AUDIO& can perform audio-functions (play, pause, etc)\cr -CDC_RESET& hard reset device\cr -CDC_IOCTLS& driver has non-standard ioctls\cr -CDC_DRIVE_STATUS& driver implements drive status\cr -} -$$ -The capability flag is declared $const$, to prevent drivers from -accidentally tampering with the contents. The capability fags actually -inform \cdromc\ of what the driver can do. If the drive found -by the driver does not have the capability, is can be masked out by -the $cdrom_device_info$ variable $mask$. For instance, the SCSI \cdrom\ -driver has implemented the code for loading and ejecting \cdrom's, and -hence its corresponding flags in $capability$ will be set. But a SCSI -\cdrom\ drive might be a caddy system, which can't load the tray, and -hence for this drive the $cdrom_device_info$ struct will have set -the $CDC_CLOSE_TRAY$ bit in $mask$. - -In the file \cdromc\ you will encounter many constructions of the type -$$\it -if\ (cdo\rightarrow capability \mathrel\& \mathord{\sim} cdi\rightarrow mask - \mathrel{\&} CDC_) \ldots -$$ -There is no $ioctl$ to set the mask\dots The reason is that -I think it is better to control the {\em behavior\/} rather than the -{\em capabilities}. - -\subsection{Options} - -A final flag register controls the {\em behavior\/} of the \cdrom\ -drives, in order to satisfy different users' wishes, hopefully -independently of the ideas of the respective author who happened to -have made the drive's support available to the \linux\ community. The -current behavior options are: -$$ -\halign{$#$\ \hfil&$/*$ \rm# $*/$\hfil\cr -CDO_AUTO_CLOSE& try to close tray upon device $open()$\cr -CDO_AUTO_EJECT& try to open tray on last device $close()$\cr -CDO_USE_FFLAGS& use $file_pointer\rightarrow f_flags$ to indicate - purpose for $open()$\cr -CDO_LOCK& try to lock door if device is opened\cr -CDO_CHECK_TYPE& ensure disc type is data if opened for data\cr -} -$$ - -The initial value of this register is $CDO_AUTO_CLOSE \mathrel| -CDO_USE_FFLAGS \mathrel| CDO_LOCK$, reflecting my own view on user -interface and software standards. Before you protest, there are two -new $ioctl$s implemented in \cdromc, that allow you to control the -behavior by software. These are: -$$ -\halign{$#$\ \hfil&$/*$ \rm# $*/$\hfil\cr -CDROM_SET_OPTIONS& set options specified in $(int)\ arg$\cr -CDROM_CLEAR_OPTIONS& clear options specified in $(int)\ arg$\cr -} -$$ -One option needs some more explanation: $CDO_USE_FFLAGS$. In the next -newsection we explain what the need for this option is. - -A software package {\tt setcd}, available from the Debian distribution -and {\tt sunsite.unc.edu}, allows user level control of these flags. - -\newsection{The need to know the purpose of opening the \cdrom\ device} - -Traditionally, Unix devices can be used in two different `modes', -either by reading/writing to the device file, or by issuing -controlling commands to the device, by the device's $ioctl()$ -call. The problem with \cdrom\ drives, is that they can be used for -two entirely different purposes. One is to mount removable -file systems, \cdrom s, the other is to play audio CD's. Audio commands -are implemented entirely through $ioctl$s, presumably because the -first implementation (SUN?) has been such. In principle there is -nothing wrong with this, but a good control of the `CD player' demands -that the device can {\em always\/} be opened in order to give the -$ioctl$ commands, regardless of the state the drive is in. - -On the other hand, when used as a removable-media disc drive (what the -original purpose of \cdrom s is) we would like to make sure that the -disc drive is ready for operation upon opening the device. In the old -scheme, some \cdrom\ drivers don't do any integrity checking, resulting -in a number of i/o errors reported by the VFS to the kernel when an -attempt for mounting a \cdrom\ on an empty drive occurs. This is not a -particularly elegant way to find out that there is no \cdrom\ inserted; -it more-or-less looks like the old IBM-PC trying to read an empty floppy -drive for a couple of seconds, after which the system complains it -can't read from it. Nowadays we can {\em sense\/} the existence of a -removable medium in a drive, and we believe we should exploit that -fact. An integrity check on opening of the device, that verifies the -availability of a \cdrom\ and its correct type (data), would be -desirable. - -These two ways of using a \cdrom\ drive, principally for data and -secondarily for playing audio discs, have different demands for the -behavior of the $open()$ call. Audio use simply wants to open the -device in order to get a file handle which is needed for issuing -$ioctl$ commands, while data use wants to open for correct and -reliable data transfer. The only way user programs can indicate what -their {\em purpose\/} of opening the device is, is through the $flags$ -parameter (see {\tt {open(2)}}). For \cdrom\ devices, these flags aren't -implemented (some drivers implement checking for write-related flags, -but this is not strictly necessary if the device file has correct -permission flags). Most option flags simply don't make sense to -\cdrom\ devices: $O_CREAT$, $O_NOCTTY$, $O_TRUNC$, $O_APPEND$, and -$O_SYNC$ have no meaning to a \cdrom. - -We therefore propose to use the flag $O_NONBLOCK$ to indicate -that the device is opened just for issuing $ioctl$ -commands. Strictly, the meaning of $O_NONBLOCK$ is that opening and -subsequent calls to the device don't cause the calling process to -wait. We could interpret this as ``don't wait until someone has -inserted some valid data-\cdrom.'' Thus, our proposal of the -implementation for the $open()$ call for \cdrom s is: -\begin{itemize} -\item If no other flags are set than $O_RDONLY$, the device is opened -for data transfer, and the return value will be 0 only upon successful -initialization of the transfer. The call may even induce some actions -on the \cdrom, such as closing the tray. -\item If the option flag $O_NONBLOCK$ is set, opening will always be -successful, unless the whole device doesn't exist. The drive will take -no actions whatsoever. -\end{itemize} - -\subsection{And what about standards?} - -You might hesitate to accept this proposal as it comes from the -\linux\ community, and not from some standardizing institute. What -about SUN, SGI, HP and all those other Unix and hardware vendors? -Well, these companies are in the lucky position that they generally -control both the hardware and software of their supported products, -and are large enough to set their own standard. They do not have to -deal with a dozen or more different, competing hardware -configurations.\footnote{Incidentally, I think that SUN's approach to -mounting \cdrom s is very good in origin: under Solaris a -volume-daemon automatically mounts a newly inserted \cdrom\ under {\tt -{/cdrom/$$/}}. In my opinion they should have pushed this -further and have {\em every\/} \cdrom\ on the local area network be -mounted at the similar location, \ie, no matter in which particular -machine you insert a \cdrom, it will always appear at the same -position in the directory tree, on every system. When I wanted to -implement such a user-program for \linux, I came across the -differences in behavior of the various drivers, and the need for an -$ioctl$ informing about media changes.} - -We believe that using $O_NONBLOCK$ to indicate that a device is being opened -for $ioctl$ commands only can be easily introduced in the \linux\ -community. All the CD-player authors will have to be informed, we can -even send in our own patches to the programs. The use of $O_NONBLOCK$ -has most likely no influence on the behavior of the CD-players on -other operating systems than \linux. Finally, a user can always revert -to old behavior by a call to $ioctl(file_descriptor, CDROM_CLEAR_OPTIONS, -CDO_USE_FFLAGS)$. - -\subsection{The preferred strategy of $open()$} - -The routines in \cdromc\ are designed in such a way that run-time -configuration of the behavior of \cdrom\ devices (of {\em any\/} type) -can be carried out, by the $CDROM_SET/CLEAR_OPTIONS$ $ioctls$. Thus, various -modes of operation can be set: -\begin{description} -\item[$CDO_AUTO_CLOSE \mathrel| CDO_USE_FFLAGS \mathrel| CDO_LOCK$] This -is the default setting. (With $CDO_CHECK_TYPE$ it will be better, in the -future.) If the device is not yet opened by any other process, and if -the device is being opened for data ($O_NONBLOCK$ is not set) and the -tray is found to be open, an attempt to close the tray is made. Then, -it is verified that a disc is in the drive and, if $CDO_CHECK_TYPE$ is -set, that it contains tracks of type `data mode 1.' Only if all tests -are passed is the return value zero. The door is locked to prevent file -system corruption. If the drive is opened for audio ($O_NONBLOCK$ is -set), no actions are taken and a value of 0 will be returned. -\item[$CDO_AUTO_CLOSE \mathrel| CDO_AUTO_EJECT \mathrel| CDO_LOCK$] This -mimics the behavior of the current sbpcd-driver. The option flags are -ignored, the tray is closed on the first open, if necessary. Similarly, -the tray is opened on the last release, \ie, if a \cdrom\ is unmounted, -it is automatically ejected, such that the user can replace it. -\end{description} -We hope that these option can convince everybody (both driver -maintainers and user program developers) to adopt the new \cdrom\ -driver scheme and option flag interpretation. - -\newsection{Description of routines in \cdromc} - -Only a few routines in \cdromc\ are exported to the drivers. In this -new section we will discuss these, as well as the functions that `take -over' the \cdrom\ interface to the kernel. The header file belonging -to \cdromc\ is called \cdromh. Formerly, some of the contents of this -file were placed in the file {\tt {ucdrom.h}}, but this file has now been -merged back into \cdromh. - -\subsection{$Struct\ file_operations\ cdrom_fops$} - -The contents of this structure were described in section~\ref{cdrom.c}. -A pointer to this structure is assigned to the $fops$ field -of the $struct gendisk$. - -\subsection{$Int\ register_cdrom( struct\ cdrom_device_info\ * cdi)$} - -This function is used in about the same way one registers $cdrom_fops$ -with the kernel, the device operations and information structures, -as described in section~\ref{cdrom.c}, should be registered with the -\UCD: -$$ -register_cdrom(\&_info)); -$$ -This function returns zero upon success, and non-zero upon -failure. The structure $_info$ should have a pointer to the -driver's $_dops$, as in -$$ -\vbox{\halign{&$#$\hfil\cr -struct\ &cdrom_device_info\ _info = \{\cr -& _dops;\cr -&\ldots\cr -\}\cr -}}$$ -Note that a driver must have one static structure, $_dops$, while -it may have as many structures $_info$ as there are minor devices -active. $Register_cdrom()$ builds a linked list from these. - -\subsection{$Void\ unregister_cdrom(struct\ cdrom_device_info * cdi)$} - -Unregistering device $cdi$ with minor number $MINOR(cdi\to dev)$ removes -the minor device from the list. If it was the last registered minor for -the low-level driver, this disconnects the registered device-operation -routines from the \cdrom\ interface. This function returns zero upon -success, and non-zero upon failure. - -\subsection{$Int\ cdrom_open(struct\ inode * ip, struct\ file * fp)$} - -This function is not called directly by the low-level drivers, it is -listed in the standard $cdrom_fops$. If the VFS opens a file, this -function becomes active. A strategy is implemented in this routine, -taking care of all capabilities and options that are set in the -$cdrom_device_ops$ connected to the device. Then, the program flow is -transferred to the device_dependent $open()$ call. - -\subsection{$Void\ cdrom_release(struct\ inode *ip, struct\ file -*fp)$} - -This function implements the reverse-logic of $cdrom_open()$, and then -calls the device-dependent $release()$ routine. When the use-count has -reached 0, the allocated buffers are flushed by calls to $sync_dev(dev)$ -and $invalidate_buffers(dev)$. - - -\subsection{$Int\ cdrom_ioctl(struct\ inode *ip, struct\ file *fp, -unsigned\ int\ cmd, unsigned\ long\ arg)$} -\label{cdrom-ioctl} - -This function handles all the standard $ioctl$ requests for \cdrom\ -devices in a uniform way. The different calls fall into three -categories: $ioctl$s that can be directly implemented by device -operations, ones that are routed through the call $audio_ioctl()$, and -the remaining ones, that are presumable device-dependent. Generally, a -negative return value indicates an error. - -\subsubsection{Directly implemented $ioctl$s} -\label{ioctl-direct} - -The following `old' \cdrom-$ioctl$s are implemented by directly -calling device-operations in $cdrom_device_ops$, if implemented and -not masked: -\begin{description} -\item[CDROMMULTISESSION] Requests the last session on a \cdrom. -\item[CDROMEJECT] Open tray. -\item[CDROMCLOSETRAY] Close tray. -\item[CDROMEJECT_SW] If $arg\not=0$, set behavior to auto-close (close -tray on first open) and auto-eject (eject on last release), otherwise -set behavior to non-moving on $open()$ and $release()$ calls. -\item[CDROM_GET_MCN] Get the Media Catalog Number from a CD. -\end{description} - -\subsubsection{$Ioctl$s routed through $audio_ioctl()$} -\label{ioctl-audio} - -The following set of $ioctl$s are all implemented through a call to -the $cdrom_fops$ function $audio_ioctl()$. Memory checks and -allocation are performed in $cdrom_ioctl()$, and also sanitization of -address format ($CDROM_LBA$/$CDROM_MSF$) is done. -\begin{description} -\item[CDROMSUBCHNL] Get sub-channel data in argument $arg$ of type $struct\ -cdrom_subchnl *{}$. -\item[CDROMREADTOCHDR] Read Table of Contents header, in $arg$ of type -$struct\ cdrom_tochdr *{}$. -\item[CDROMREADTOCENTRY] Read a Table of Contents entry in $arg$ and -specified by $arg$ of type $struct\ cdrom_tocentry *{}$. -\item[CDROMPLAYMSF] Play audio fragment specified in Minute, Second, -Frame format, delimited by $arg$ of type $struct\ cdrom_msf *{}$. -\item[CDROMPLAYTRKIND] Play audio fragment in track-index format -delimited by $arg$ of type $struct\ \penalty-1000 cdrom_ti *{}$. -\item[CDROMVOLCTRL] Set volume specified by $arg$ of type $struct\ -cdrom_volctrl *{}$. -\item[CDROMVOLREAD] Read volume into by $arg$ of type $struct\ -cdrom_volctrl *{}$. -\item[CDROMSTART] Spin up disc. -\item[CDROMSTOP] Stop playback of audio fragment. -\item[CDROMPAUSE] Pause playback of audio fragment. -\item[CDROMRESUME] Resume playing. -\end{description} - -\subsubsection{New $ioctl$s in \cdromc} - -The following $ioctl$s have been introduced to allow user programs to -control the behavior of individual \cdrom\ devices. New $ioctl$ -commands can be identified by the underscores in their names. -\begin{description} -\item[CDROM_SET_OPTIONS] Set options specified by $arg$. Returns the -option flag register after modification. Use $arg = \rm0$ for reading -the current flags. -\item[CDROM_CLEAR_OPTIONS] Clear options specified by $arg$. Returns - the option flag register after modification. -\item[CDROM_SELECT_SPEED] Select head-rate speed of disc specified as - by $arg$ in units of standard cdrom speed (176\,kB/sec raw data or - 150\,kB/sec file system data). The value 0 means `auto-select', \ie, - play audio discs at real time and data discs at maximum speed. The value - $arg$ is checked against the maximum head rate of the drive found in the - $cdrom_dops$. -\item[CDROM_SELECT_DISC] Select disc numbered $arg$ from a juke-box. - First disc is numbered 0. The number $arg$ is checked against the - maximum number of discs in the juke-box found in the $cdrom_dops$. -\item[CDROM_MEDIA_CHANGED] Returns 1 if a disc has been changed since - the last call. Note that calls to $cdrom_media_changed$ by the VFS - are treated by an independent queue, so both mechanisms will detect - a media change once. For juke-boxes, an extra argument $arg$ - specifies the slot for which the information is given. The special - value $CDSL_CURRENT$ requests that information about the currently - selected slot be returned. -\item[CDROM_DRIVE_STATUS] Returns the status of the drive by a call to - $drive_status()$. Return values are defined in section~\ref{drive - status}. Note that this call doesn't return information on the - current playing activity of the drive; this can be polled through an - $ioctl$ call to $CDROMSUBCHNL$. For juke-boxes, an extra argument - $arg$ specifies the slot for which (possibly limited) information is - given. The special value $CDSL_CURRENT$ requests that information - about the currently selected slot be returned. -\item[CDROM_DISC_STATUS] Returns the type of the disc currently in the - drive. It should be viewed as a complement to $CDROM_DRIVE_STATUS$. - This $ioctl$ can provide \emph {some} information about the current - disc that is inserted in the drive. This functionality used to be - implemented in the low level drivers, but is now carried out - entirely in \UCD. - - The history of development of the CD's use as a carrier medium for - various digital information has lead to many different disc types. - This $ioctl$ is useful only in the case that CDs have \emph {only - one} type of data on them. While this is often the case, it is - also very common for CDs to have some tracks with data, and some - tracks with audio. Because this is an existing interface, rather - than fixing this interface by changing the assumptions it was made - under, thereby breaking all user applications that use this - function, the \UCD\ implements this $ioctl$ as follows: If the CD in - question has audio tracks on it, and it has absolutely no CD-I, XA, - or data tracks on it, it will be reported as $CDS_AUDIO$. If it has - both audio and data tracks, it will return $CDS_MIXED$. If there - are no audio tracks on the disc, and if the CD in question has any - CD-I tracks on it, it will be reported as $CDS_XA_2_2$. Failing - that, if the CD in question has any XA tracks on it, it will be - reported as $CDS_XA_2_1$. Finally, if the CD in question has any - data tracks on it, it will be reported as a data CD ($CDS_DATA_1$). - - This $ioctl$ can return: - $$ - \halign{$#$\ \hfil&$/*$ \rm# $*/$\hfil\cr - CDS_NO_INFO& no information available\cr - CDS_NO_DISC& no disc is inserted, or tray is opened\cr - CDS_AUDIO& Audio disc (2352 audio bytes/frame)\cr - CDS_DATA_1& data disc, mode 1 (2048 user bytes/frame)\cr - CDS_XA_2_1& mixed data (XA), mode 2, form 1 (2048 user bytes)\cr - CDS_XA_2_2& mixed data (XA), mode 2, form 1 (2324 user bytes)\cr - CDS_MIXED& mixed audio/data disc\cr - } - $$ - For some information concerning frame layout of the various disc - types, see a recent version of \cdromh. - -\item[CDROM_CHANGER_NSLOTS] Returns the number of slots in a - juke-box. -\item[CDROMRESET] Reset the drive. -\item[CDROM_GET_CAPABILITY] Returns the $capability$ flags for the - drive. Refer to section \ref{capability} for more information on - these flags. -\item[CDROM_LOCKDOOR] Locks the door of the drive. $arg == \rm0$ - unlocks the door, any other value locks it. -\item[CDROM_DEBUG] Turns on debugging info. Only root is allowed - to do this. Same semantics as CDROM_LOCKDOOR. -\end{description} - -\subsubsection{Device dependent $ioctl$s} - -Finally, all other $ioctl$s are passed to the function $dev_ioctl()$, -if implemented. No memory allocation or verification is carried out. - -\newsection{How to update your driver} - -\begin{enumerate} -\item Make a backup of your current driver. -\item Get hold of the files \cdromc\ and \cdromh, they should be in - the directory tree that came with this documentation. -\item Make sure you include \cdromh. -\item Change the 3rd argument of $register_blkdev$ from -$\&_fops$ to $\&cdrom_fops$. -\item Just after that line, add the following to register with the \UCD: - $$register_cdrom(\&_info);$$ - Similarly, add a call to $unregister_cdrom()$ at the appropriate place. -\item Copy an example of the device-operations $struct$ to your - source, \eg, from {\tt {cm206.c}} $cm206_dops$, and change all - entries to names corresponding to your driver, or names you just - happen to like. If your driver doesn't support a certain function, - make the entry $NULL$. At the entry $capability$ you should list all - capabilities your driver currently supports. If your driver - has a capability that is not listed, please send me a message. -\item Copy the $cdrom_device_info$ declaration from the same example - driver, and modify the entries according to your needs. If your - driver dynamically determines the capabilities of the hardware, this - structure should also be declared dynamically. -\item Implement all functions in your $_dops$ structure, - according to prototypes listed in \cdromh, and specifications given - in section~\ref{cdrom.c}. Most likely you have already implemented - the code in a large part, and you will almost certainly need to adapt the - prototype and return values. -\item Rename your $_ioctl()$ function to $audio_ioctl$ and - change the prototype a little. Remove entries listed in the first - part in section~\ref{cdrom-ioctl}, if your code was OK, these are - just calls to the routines you adapted in the previous step. -\item You may remove all remaining memory checking code in the - $audio_ioctl()$ function that deals with audio commands (these are - listed in the second part of section~\ref{cdrom-ioctl}). There is no - need for memory allocation either, so most $case$s in the $switch$ - statement look similar to: - $$ - case\ CDROMREADTOCENTRY\colon get_toc_entry\bigl((struct\ - cdrom_tocentry *{})\ arg\bigr); - $$ -\item All remaining $ioctl$ cases must be moved to a separate - function, $_ioctl$, the device-dependent $ioctl$s. Note that - memory checking and allocation must be kept in this code! -\item Change the prototypes of $_open()$ and - $_release()$, and remove any strategic code (\ie, tray - movement, door locking, etc.). -\item Try to recompile the drivers. We advise you to use modules, both - for {\tt {cdrom.o}} and your driver, as debugging is much easier this - way. -\end{enumerate} - -\newsection{Thanks} - -Thanks to all the people involved. First, Erik Andersen, who has -taken over the torch in maintaining \cdromc\ and integrating much -\cdrom-related code in the 2.1-kernel. Thanks to Scott Snyder and -Gerd Knorr, who were the first to implement this interface for SCSI -and IDE-CD drivers and added many ideas for extension of the data -structures relative to kernel~2.0. Further thanks to Heiko Ei{\ss}feldt, -Thomas Quinot, Jon Tombs, Ken Pizzini, Eberhard M\"onkeberg and Andrew -Kroll, the \linux\ \cdrom\ device driver developers who were kind -enough to give suggestions and criticisms during the writing. Finally -of course, I want to thank Linus Torvalds for making this possible in -the first place. - -\vfill -$ \version\ $ -\eject -\end{document} diff --git a/Documentation/cdrom/cdrom-standard.txt b/Documentation/cdrom/cdrom-standard.txt new file mode 100644 index 000000000000..dde4f7f7fdbf --- /dev/null +++ b/Documentation/cdrom/cdrom-standard.txt @@ -0,0 +1,1063 @@ +======================= +A Linux CD-ROM standard +======================= + +:Author: David van Leeuwen +:Date: 12 March 1999 +:Updated by: Erik Andersen (andersee@debian.org) +:Updated by: Jens Axboe (axboe@image.dk) + + +Introduction +============ + +Linux is probably the Unix-like operating system that supports +the widest variety of hardware devices. The reasons for this are +presumably + +- The large list of hardware devices available for the many platforms + that Linux now supports (i.e., i386-PCs, Sparc Suns, etc.) +- The open design of the operating system, such that anybody can write a + driver for Linux. +- There is plenty of source code around as examples of how to write a driver. + +The openness of Linux, and the many different types of available +hardware has allowed Linux to support many different hardware devices. +Unfortunately, the very openness that has allowed Linux to support +all these different devices has also allowed the behavior of each +device driver to differ significantly from one device to another. +This divergence of behavior has been very significant for CD-ROM +devices; the way a particular drive reacts to a `standard` *ioctl()* +call varies greatly from one device driver to another. To avoid making +their drivers totally inconsistent, the writers of Linux CD-ROM +drivers generally created new device drivers by understanding, copying, +and then changing an existing one. Unfortunately, this practice did not +maintain uniform behavior across all the Linux CD-ROM drivers. + +This document describes an effort to establish Uniform behavior across +all the different CD-ROM device drivers for Linux. This document also +defines the various *ioctl()'s*, and how the low-level CD-ROM device +drivers should implement them. Currently (as of the Linux 2.1.\ *x* +development kernels) several low-level CD-ROM device drivers, including +both IDE/ATAPI and SCSI, now use this Uniform interface. + +When the CD-ROM was developed, the interface between the CD-ROM drive +and the computer was not specified in the standards. As a result, many +different CD-ROM interfaces were developed. Some of them had their +own proprietary design (Sony, Mitsumi, Panasonic, Philips), other +manufacturers adopted an existing electrical interface and changed +the functionality (CreativeLabs/SoundBlaster, Teac, Funai) or simply +adapted their drives to one or more of the already existing electrical +interfaces (Aztech, Sanyo, Funai, Vertos, Longshine, Optics Storage and +most of the `NoName` manufacturers). In cases where a new drive really +brought its own interface or used its own command set and flow control +scheme, either a separate driver had to be written, or an existing +driver had to be enhanced. History has delivered us CD-ROM support for +many of these different interfaces. Nowadays, almost all new CD-ROM +drives are either IDE/ATAPI or SCSI, and it is very unlikely that any +manufacturer will create a new interface. Even finding drives for the +old proprietary interfaces is getting difficult. + +When (in the 1.3.70's) I looked at the existing software interface, +which was expressed through `cdrom.h`, it appeared to be a rather wild +set of commands and data formats [#f1]_. It seemed that many +features of the software interface had been added to accommodate the +capabilities of a particular drive, in an *ad hoc* manner. More +importantly, it appeared that the behavior of the `standard` commands +was different for most of the different drivers: e. g., some drivers +close the tray if an *open()* call occurs when the tray is open, while +others do not. Some drivers lock the door upon opening the device, to +prevent an incoherent file system, but others don't, to allow software +ejection. Undoubtedly, the capabilities of the different drives vary, +but even when two drives have the same capability their drivers' +behavior was usually different. + +.. [#f1] + I cannot recollect what kernel version I looked at, then, + presumably 1.2.13 and 1.3.34 --- the latest kernel that I was + indirectly involved in. + +I decided to start a discussion on how to make all the Linux CD-ROM +drivers behave more uniformly. I began by contacting the developers of +the many CD-ROM drivers found in the Linux kernel. Their reactions +encouraged me to write the Uniform CD-ROM Driver which this document is +intended to describe. The implementation of the Uniform CD-ROM Driver is +in the file `cdrom.c`. This driver is intended to be an additional software +layer that sits on top of the low-level device drivers for each CD-ROM drive. +By adding this additional layer, it is possible to have all the different +CD-ROM devices behave **exactly** the same (insofar as the underlying +hardware will allow). + +The goal of the Uniform CD-ROM Driver is **not** to alienate driver developers +whohave not yet taken steps to support this effort. The goal of Uniform CD-ROM +Driver is simply to give people writing application programs for CD-ROM drives +**one** Linux CD-ROM interface with consistent behavior for all +CD-ROM devices. In addition, this also provides a consistent interface +between the low-level device driver code and the Linux kernel. Care +is taken that 100% compatibility exists with the data structures and +programmer's interface defined in `cdrom.h`. This guide was written to +help CD-ROM driver developers adapt their code to use the Uniform CD-ROM +Driver code defined in `cdrom.c`. + +Personally, I think that the most important hardware interfaces are +the IDE/ATAPI drives and, of course, the SCSI drives, but as prices +of hardware drop continuously, it is also likely that people may have +more than one CD-ROM drive, possibly of mixed types. It is important +that these drives behave in the same way. In December 1994, one of the +cheapest CD-ROM drives was a Philips cm206, a double-speed proprietary +drive. In the months that I was busy writing a Linux driver for it, +proprietary drives became obsolete and IDE/ATAPI drives became the +standard. At the time of the last update to this document (November +1997) it is becoming difficult to even **find** anything less than a +16 speed CD-ROM drive, and 24 speed drives are common. + +.. _cdrom_api: + +Standardizing through another software level +============================================ + +At the time this document was conceived, all drivers directly +implemented the CD-ROM *ioctl()* calls through their own routines. This +led to the danger of different drivers forgetting to do important things +like checking that the user was giving the driver valid data. More +importantly, this led to the divergence of behavior, which has already +been discussed. + +For this reason, the Uniform CD-ROM Driver was created to enforce consistent +CD-ROM drive behavior, and to provide a common set of services to the various +low-level CD-ROM device drivers. The Uniform CD-ROM Driver now provides another +software-level, that separates the *ioctl()* and *open()* implementation +from the actual hardware implementation. Note that this effort has +made few changes which will affect a user's application programs. The +greatest change involved moving the contents of the various low-level +CD-ROM drivers\' header files to the kernel's cdrom directory. This was +done to help ensure that the user is only presented with only one cdrom +interface, the interface defined in `cdrom.h`. + +CD-ROM drives are specific enough (i. e., different from other +block-devices such as floppy or hard disc drives), to define a set +of common **CD-ROM device operations**, *_dops*. +These operations are different from the classical block-device file +operations, *_fops*. + +The routines for the Uniform CD-ROM Driver interface level are implemented +in the file `cdrom.c`. In this file, the Uniform CD-ROM Driver interfaces +with the kernel as a block device by registering the following general +*struct file_operations*:: + + struct file_operations cdrom_fops = { + NULL, /∗ lseek ∗/ + block _read , /∗ read—general block-dev read ∗/ + block _write, /∗ write—general block-dev write ∗/ + NULL, /∗ readdir ∗/ + NULL, /∗ select ∗/ + cdrom_ioctl, /∗ ioctl ∗/ + NULL, /∗ mmap ∗/ + cdrom_open, /∗ open ∗/ + cdrom_release, /∗ release ∗/ + NULL, /∗ fsync ∗/ + NULL, /∗ fasync ∗/ + cdrom_media_changed, /∗ media change ∗/ + NULL /∗ revalidate ∗/ + }; + +Every active CD-ROM device shares this *struct*. The routines +declared above are all implemented in `cdrom.c`, since this file is the +place where the behavior of all CD-ROM-devices is defined and +standardized. The actual interface to the various types of CD-ROM +hardware is still performed by various low-level CD-ROM-device +drivers. These routines simply implement certain **capabilities** +that are common to all CD-ROM (and really, all removable-media +devices). + +Registration of a low-level CD-ROM device driver is now done through +the general routines in `cdrom.c`, not through the Virtual File System +(VFS) any more. The interface implemented in `cdrom.c` is carried out +through two general structures that contain information about the +capabilities of the driver, and the specific drives on which the +driver operates. The structures are: + +cdrom_device_ops + This structure contains information about the low-level driver for a + CD-ROM device. This structure is conceptually connected to the major + number of the device (although some drivers may have different + major numbers, as is the case for the IDE driver). + +cdrom_device_info + This structure contains information about a particular CD-ROM drive, + such as its device name, speed, etc. This structure is conceptually + connected to the minor number of the device. + +Registering a particular CD-ROM drive with the Uniform CD-ROM Driver +is done by the low-level device driver though a call to:: + + register_cdrom(struct cdrom_device_info * _info) + +The device information structure, *_info*, contains all the +information needed for the kernel to interface with the low-level +CD-ROM device driver. One of the most important entries in this +structure is a pointer to the *cdrom_device_ops* structure of the +low-level driver. + +The device operations structure, *cdrom_device_ops*, contains a list +of pointers to the functions which are implemented in the low-level +device driver. When `cdrom.c` accesses a CD-ROM device, it does it +through the functions in this structure. It is impossible to know all +the capabilities of future CD-ROM drives, so it is expected that this +list may need to be expanded from time to time as new technologies are +developed. For example, CD-R and CD-R/W drives are beginning to become +popular, and support will soon need to be added for them. For now, the +current *struct* is:: + + struct cdrom_device_ops { + int (*open)(struct cdrom_device_info *, int) + void (*release)(struct cdrom_device_info *); + int (*drive_status)(struct cdrom_device_info *, int); + unsigned int (*check_events)(struct cdrom_device_info *, + unsigned int, int); + int (*media_changed)(struct cdrom_device_info *, int); + int (*tray_move)(struct cdrom_device_info *, int); + int (*lock_door)(struct cdrom_device_info *, int); + int (*select_speed)(struct cdrom_device_info *, int); + int (*select_disc)(struct cdrom_device_info *, int); + int (*get_last_session) (struct cdrom_device_info *, + struct cdrom_multisession *); + int (*get_mcn)(struct cdrom_device_info *, struct cdrom_mcn *); + int (*reset)(struct cdrom_device_info *); + int (*audio_ioctl)(struct cdrom_device_info *, + unsigned int, void *); + const int capability; /* capability flags */ + int (*generic_packet)(struct cdrom_device_info *, + struct packet_command *); + }; + +When a low-level device driver implements one of these capabilities, +it should add a function pointer to this *struct*. When a particular +function is not implemented, however, this *struct* should contain a +NULL instead. The *capability* flags specify the capabilities of the +CD-ROM hardware and/or low-level CD-ROM driver when a CD-ROM drive +is registered with the Uniform CD-ROM Driver. + +Note that most functions have fewer parameters than their +*blkdev_fops* counterparts. This is because very little of the +information in the structures *inode* and *file* is used. For most +drivers, the main parameter is the *struct* *cdrom_device_info*, from +which the major and minor number can be extracted. (Most low-level +CD-ROM drivers don't even look at the major and minor number though, +since many of them only support one device.) This will be available +through *dev* in *cdrom_device_info* described below. + +The drive-specific, minor-like information that is registered with +`cdrom.c`, currently contains the following fields:: + + struct cdrom_device_info { + const struct cdrom_device_ops * ops; /* device operations for this major */ + struct list_head list; /* linked list of all device_info */ + struct gendisk * disk; /* matching block layer disk */ + void * handle; /* driver-dependent data */ + + int mask; /* mask of capability: disables them */ + int speed; /* maximum speed for reading data */ + int capacity; /* number of discs in a jukebox */ + + unsigned int options:30; /* options flags */ + unsigned mc_flags:2; /* media-change buffer flags */ + unsigned int vfs_events; /* cached events for vfs path */ + unsigned int ioctl_events; /* cached events for ioctl path */ + int use_count; /* number of times device is opened */ + char name[20]; /* name of the device type */ + + __u8 sanyo_slot : 2; /* Sanyo 3-CD changer support */ + __u8 keeplocked : 1; /* CDROM_LOCKDOOR status */ + __u8 reserved : 5; /* not used yet */ + int cdda_method; /* see CDDA_* flags */ + __u8 last_sense; /* saves last sense key */ + __u8 media_written; /* dirty flag, DVD+RW bookkeeping */ + unsigned short mmc3_profile; /* current MMC3 profile */ + int for_data; /* unknown:TBD */ + int (*exit)(struct cdrom_device_info *);/* unknown:TBD */ + int mrw_mode_page; /* which MRW mode page is in use */ + }; + +Using this *struct*, a linked list of the registered minor devices is +built, using the *next* field. The device number, the device operations +struct and specifications of properties of the drive are stored in this +structure. + +The *mask* flags can be used to mask out some of the capabilities listed +in *ops->capability*, if a specific drive doesn't support a feature +of the driver. The value *speed* specifies the maximum head-rate of the +drive, measured in units of normal audio speed (176kB/sec raw data or +150kB/sec file system data). The parameters are declared *const* +because they describe properties of the drive, which don't change after +registration. + +A few registers contain variables local to the CD-ROM drive. The +flags *options* are used to specify how the general CD-ROM routines +should behave. These various flags registers should provide enough +flexibility to adapt to the different users' wishes (and **not** the +`arbitrary` wishes of the author of the low-level device driver, as is +the case in the old scheme). The register *mc_flags* is used to buffer +the information from *media_changed()* to two separate queues. Other +data that is specific to a minor drive, can be accessed through *handle*, +which can point to a data structure specific to the low-level driver. +The fields *use_count*, *next*, *options* and *mc_flags* need not be +initialized. + +The intermediate software layer that `cdrom.c` forms will perform some +additional bookkeeping. The use count of the device (the number of +processes that have the device opened) is registered in *use_count*. The +function *cdrom_ioctl()* will verify the appropriate user-memory regions +for read and write, and in case a location on the CD is transferred, +it will `sanitize` the format by making requests to the low-level +drivers in a standard format, and translating all formats between the +user-software and low level drivers. This relieves much of the drivers' +memory checking and format checking and translation. Also, the necessary +structures will be declared on the program stack. + +The implementation of the functions should be as defined in the +following sections. Two functions **must** be implemented, namely +*open()* and *release()*. Other functions may be omitted, their +corresponding capability flags will be cleared upon registration. +Generally, a function returns zero on success and negative on error. A +function call should return only after the command has completed, but of +course waiting for the device should not use processor time. + +:: + + int open(struct cdrom_device_info *cdi, int purpose) + +*Open()* should try to open the device for a specific *purpose*, which +can be either: + +- Open for reading data, as done by `mount()` (2), or the + user commands `dd` or `cat`. +- Open for *ioctl* commands, as done by audio-CD playing programs. + +Notice that any strategic code (closing tray upon *open()*, etc.) is +done by the calling routine in `cdrom.c`, so the low-level routine +should only be concerned with proper initialization, such as spinning +up the disc, etc. + +:: + + void release(struct cdrom_device_info *cdi) + +Device-specific actions should be taken such as spinning down the device. +However, strategic actions such as ejection of the tray, or unlocking +the door, should be left over to the general routine *cdrom_release()*. +This is the only function returning type *void*. + +.. _cdrom_drive_status: + +:: + + int drive_status(struct cdrom_device_info *cdi, int slot_nr) + +The function *drive_status*, if implemented, should provide +information on the status of the drive (not the status of the disc, +which may or may not be in the drive). If the drive is not a changer, +*slot_nr* should be ignored. In `cdrom.h` the possibilities are listed:: + + + CDS_NO_INFO /* no information available */ + CDS_NO_DISC /* no disc is inserted, tray is closed */ + CDS_TRAY_OPEN /* tray is opened */ + CDS_DRIVE_NOT_READY /* something is wrong, tray is moving? */ + CDS_DISC_OK /* a disc is loaded and everything is fine */ + +:: + + int media_changed(struct cdrom_device_info *cdi, int disc_nr) + +This function is very similar to the original function in $struct +file_operations*. It returns 1 if the medium of the device *cdi->dev* +has changed since the last call, and 0 otherwise. The parameter +*disc_nr* identifies a specific slot in a juke-box, it should be +ignored for single-disc drives. Note that by `re-routing` this +function through *cdrom_media_changed()*, we can implement separate +queues for the VFS and a new *ioctl()* function that can report device +changes to software (e. g., an auto-mounting daemon). + +:: + + int tray_move(struct cdrom_device_info *cdi, int position) + +This function, if implemented, should control the tray movement. (No +other function should control this.) The parameter *position* controls +the desired direction of movement: + +- 0 Close tray +- 1 Open tray + +This function returns 0 upon success, and a non-zero value upon +error. Note that if the tray is already in the desired position, no +action need be taken, and the return value should be 0. + +:: + + int lock_door(struct cdrom_device_info *cdi, int lock) + +This function (and no other code) controls locking of the door, if the +drive allows this. The value of *lock* controls the desired locking +state: + +- 0 Unlock door, manual opening is allowed +- 1 Lock door, tray cannot be ejected manually + +This function returns 0 upon success, and a non-zero value upon +error. Note that if the door is already in the requested state, no +action need be taken, and the return value should be 0. + +:: + + int select_speed(struct cdrom_device_info *cdi, int speed) + +Some CD-ROM drives are capable of changing their head-speed. There +are several reasons for changing the speed of a CD-ROM drive. Badly +pressed CD-ROM s may benefit from less-than-maximum head rate. Modern +CD-ROM drives can obtain very high head rates (up to *24x* is +common). It has been reported that these drives can make reading +errors at these high speeds, reducing the speed can prevent data loss +in these circumstances. Finally, some of these drives can +make an annoyingly loud noise, which a lower speed may reduce. + +This function specifies the speed at which data is read or audio is +played back. The value of *speed* specifies the head-speed of the +drive, measured in units of standard cdrom speed (176kB/sec raw data +or 150kB/sec file system data). So to request that a CD-ROM drive +operate at 300kB/sec you would call the CDROM_SELECT_SPEED *ioctl* +with *speed=2*. The special value `0` means `auto-selection`, i. e., +maximum data-rate or real-time audio rate. If the drive doesn't have +this `auto-selection` capability, the decision should be made on the +current disc loaded and the return value should be positive. A negative +return value indicates an error. + +:: + + int select_disc(struct cdrom_device_info *cdi, int number) + +If the drive can store multiple discs (a juke-box) this function +will perform disc selection. It should return the number of the +selected disc on success, a negative value on error. Currently, only +the ide-cd driver supports this functionality. + +:: + + int get_last_session(struct cdrom_device_info *cdi, + struct cdrom_multisession *ms_info) + +This function should implement the old corresponding *ioctl()*. For +device *cdi->dev*, the start of the last session of the current disc +should be returned in the pointer argument *ms_info*. Note that +routines in `cdrom.c` have sanitized this argument: its requested +format will **always** be of the type *CDROM_LBA* (linear block +addressing mode), whatever the calling software requested. But +sanitization goes even further: the low-level implementation may +return the requested information in *CDROM_MSF* format if it wishes so +(setting the *ms_info->addr_format* field appropriately, of +course) and the routines in `cdrom.c` will make the transformation if +necessary. The return value is 0 upon success. + +:: + + int get_mcn(struct cdrom_device_info *cdi, + struct cdrom_mcn *mcn) + +Some discs carry a `Media Catalog Number` (MCN), also called +`Universal Product Code` (UPC). This number should reflect the number +that is generally found in the bar-code on the product. Unfortunately, +the few discs that carry such a number on the disc don't even use the +same format. The return argument to this function is a pointer to a +pre-declared memory region of type *struct cdrom_mcn*. The MCN is +expected as a 13-character string, terminated by a null-character. + +:: + + int reset(struct cdrom_device_info *cdi) + +This call should perform a hard-reset on the drive (although in +circumstances that a hard-reset is necessary, a drive may very well not +listen to commands anymore). Preferably, control is returned to the +caller only after the drive has finished resetting. If the drive is no +longer listening, it may be wise for the underlying low-level cdrom +driver to time out. + +:: + + int audio_ioctl(struct cdrom_device_info *cdi, + unsigned int cmd, void *arg) + +Some of the CD-ROM-\ *ioctl()*\ 's defined in `cdrom.h` can be +implemented by the routines described above, and hence the function +*cdrom_ioctl* will use those. However, most *ioctl()*\ 's deal with +audio-control. We have decided to leave these to be accessed through a +single function, repeating the arguments *cmd* and *arg*. Note that +the latter is of type *void*, rather than *unsigned long int*. +The routine *cdrom_ioctl()* does do some useful things, +though. It sanitizes the address format type to *CDROM_MSF* (Minutes, +Seconds, Frames) for all audio calls. It also verifies the memory +location of *arg*, and reserves stack-memory for the argument. This +makes implementation of the *audio_ioctl()* much simpler than in the +old driver scheme. For example, you may look up the function +*cm206_audio_ioctl()* `cm206.c` that should be updated with +this documentation. + +An unimplemented ioctl should return *-ENOSYS*, but a harmless request +(e. g., *CDROMSTART*) may be ignored by returning 0 (success). Other +errors should be according to the standards, whatever they are. When +an error is returned by the low-level driver, the Uniform CD-ROM Driver +tries whenever possible to return the error code to the calling program. +(We may decide to sanitize the return value in *cdrom_ioctl()* though, in +order to guarantee a uniform interface to the audio-player software.) + +:: + + int dev_ioctl(struct cdrom_device_info *cdi, + unsigned int cmd, unsigned long arg) + +Some *ioctl()'s* seem to be specific to certain CD-ROM drives. That is, +they are introduced to service some capabilities of certain drives. In +fact, there are 6 different *ioctl()'s* for reading data, either in some +particular kind of format, or audio data. Not many drives support +reading audio tracks as data, I believe this is because of protection +of copyrights of artists. Moreover, I think that if audio-tracks are +supported, it should be done through the VFS and not via *ioctl()'s*. A +problem here could be the fact that audio-frames are 2352 bytes long, +so either the audio-file-system should ask for 75264 bytes at once +(the least common multiple of 512 and 2352), or the drivers should +bend their backs to cope with this incoherence (to which I would be +opposed). Furthermore, it is very difficult for the hardware to find +the exact frame boundaries, since there are no synchronization headers +in audio frames. Once these issues are resolved, this code should be +standardized in `cdrom.c`. + +Because there are so many *ioctl()'s* that seem to be introduced to +satisfy certain drivers [#f2]_, any non-standard *ioctl()*\ s +are routed through the call *dev_ioctl()*. In principle, `private` +*ioctl()*\ 's should be numbered after the device's major number, and not +the general CD-ROM *ioctl* number, `0x53`. Currently the +non-supported *ioctl()'s* are: + + CDROMREADMODE1, CDROMREADMODE2, CDROMREADAUDIO, CDROMREADRAW, + CDROMREADCOOKED, CDROMSEEK, CDROMPLAY-BLK and CDROM-READALL + +.. [#f2] + + Is there software around that actually uses these? I'd be interested! + +.. _cdrom_capabilities: + +CD-ROM capabilities +------------------- + +Instead of just implementing some *ioctl* calls, the interface in +`cdrom.c` supplies the possibility to indicate the **capabilities** +of a CD-ROM drive. This can be done by ORing any number of +capability-constants that are defined in `cdrom.h` at the registration +phase. Currently, the capabilities are any of:: + + CDC_CLOSE_TRAY /* can close tray by software control */ + CDC_OPEN_TRAY /* can open tray */ + CDC_LOCK /* can lock and unlock the door */ + CDC_SELECT_SPEED /* can select speed, in units of * sim*150 ,kB/s */ + CDC_SELECT_DISC /* drive is juke-box */ + CDC_MULTI_SESSION /* can read sessions *> rm1* */ + CDC_MCN /* can read Media Catalog Number */ + CDC_MEDIA_CHANGED /* can report if disc has changed */ + CDC_PLAY_AUDIO /* can perform audio-functions (play, pause, etc) */ + CDC_RESET /* hard reset device */ + CDC_IOCTLS /* driver has non-standard ioctls */ + CDC_DRIVE_STATUS /* driver implements drive status */ + +The capability flag is declared *const*, to prevent drivers from +accidentally tampering with the contents. The capability fags actually +inform `cdrom.c` of what the driver can do. If the drive found +by the driver does not have the capability, is can be masked out by +the *cdrom_device_info* variable *mask*. For instance, the SCSI CD-ROM +driver has implemented the code for loading and ejecting CD-ROM's, and +hence its corresponding flags in *capability* will be set. But a SCSI +CD-ROM drive might be a caddy system, which can't load the tray, and +hence for this drive the *cdrom_device_info* struct will have set +the *CDC_CLOSE_TRAY* bit in *mask*. + +In the file `cdrom.c` you will encounter many constructions of the type:: + + if (cdo->capability & ∼cdi->mask & CDC _⟨capability⟩) ... + +There is no *ioctl* to set the mask... The reason is that +I think it is better to control the **behavior** rather than the +**capabilities**. + +Options +------- + +A final flag register controls the **behavior** of the CD-ROM +drives, in order to satisfy different users' wishes, hopefully +independently of the ideas of the respective author who happened to +have made the drive's support available to the Linux community. The +current behavior options are:: + + CDO_AUTO_CLOSE /* try to close tray upon device open() */ + CDO_AUTO_EJECT /* try to open tray on last device close() */ + CDO_USE_FFLAGS /* use file_pointer->f_flags to indicate purpose for open() */ + CDO_LOCK /* try to lock door if device is opened */ + CDO_CHECK_TYPE /* ensure disc type is data if opened for data */ + +The initial value of this register is +`CDO_AUTO_CLOSE | CDO_USE_FFLAGS | CDO_LOCK`, reflecting my own view on user +interface and software standards. Before you protest, there are two +new *ioctl()'s* implemented in `cdrom.c`, that allow you to control the +behavior by software. These are:: + + CDROM_SET_OPTIONS /* set options specified in (int)arg */ + CDROM_CLEAR_OPTIONS /* clear options specified in (int)arg */ + +One option needs some more explanation: *CDO_USE_FFLAGS*. In the next +newsection we explain what the need for this option is. + +A software package `setcd`, available from the Debian distribution +and `sunsite.unc.edu`, allows user level control of these flags. + + +The need to know the purpose of opening the CD-ROM device +========================================================= + +Traditionally, Unix devices can be used in two different `modes`, +either by reading/writing to the device file, or by issuing +controlling commands to the device, by the device's *ioctl()* +call. The problem with CD-ROM drives, is that they can be used for +two entirely different purposes. One is to mount removable +file systems, CD-ROM's, the other is to play audio CD's. Audio commands +are implemented entirely through *ioctl()\'s*, presumably because the +first implementation (SUN?) has been such. In principle there is +nothing wrong with this, but a good control of the `CD player` demands +that the device can **always** be opened in order to give the +*ioctl* commands, regardless of the state the drive is in. + +On the other hand, when used as a removable-media disc drive (what the +original purpose of CD-ROM s is) we would like to make sure that the +disc drive is ready for operation upon opening the device. In the old +scheme, some CD-ROM drivers don't do any integrity checking, resulting +in a number of i/o errors reported by the VFS to the kernel when an +attempt for mounting a CD-ROM on an empty drive occurs. This is not a +particularly elegant way to find out that there is no CD-ROM inserted; +it more-or-less looks like the old IBM-PC trying to read an empty floppy +drive for a couple of seconds, after which the system complains it +can't read from it. Nowadays we can **sense** the existence of a +removable medium in a drive, and we believe we should exploit that +fact. An integrity check on opening of the device, that verifies the +availability of a CD-ROM and its correct type (data), would be +desirable. + +These two ways of using a CD-ROM drive, principally for data and +secondarily for playing audio discs, have different demands for the +behavior of the *open()* call. Audio use simply wants to open the +device in order to get a file handle which is needed for issuing +*ioctl* commands, while data use wants to open for correct and +reliable data transfer. The only way user programs can indicate what +their *purpose* of opening the device is, is through the *flags* +parameter (see `open(2)`). For CD-ROM devices, these flags aren't +implemented (some drivers implement checking for write-related flags, +but this is not strictly necessary if the device file has correct +permission flags). Most option flags simply don't make sense to +CD-ROM devices: *O_CREAT*, *O_NOCTTY*, *O_TRUNC*, *O_APPEND*, and +*O_SYNC* have no meaning to a CD-ROM. + +We therefore propose to use the flag *O_NONBLOCK* to indicate +that the device is opened just for issuing *ioctl* +commands. Strictly, the meaning of *O_NONBLOCK* is that opening and +subsequent calls to the device don't cause the calling process to +wait. We could interpret this as don't wait until someone has +inserted some valid data-CD-ROM. Thus, our proposal of the +implementation for the *open()* call for CD-ROM s is: + +- If no other flags are set than *O_RDONLY*, the device is opened + for data transfer, and the return value will be 0 only upon successful + initialization of the transfer. The call may even induce some actions + on the CD-ROM, such as closing the tray. +- If the option flag *O_NONBLOCK* is set, opening will always be + successful, unless the whole device doesn't exist. The drive will take + no actions whatsoever. + +And what about standards? +------------------------- + +You might hesitate to accept this proposal as it comes from the +Linux community, and not from some standardizing institute. What +about SUN, SGI, HP and all those other Unix and hardware vendors? +Well, these companies are in the lucky position that they generally +control both the hardware and software of their supported products, +and are large enough to set their own standard. They do not have to +deal with a dozen or more different, competing hardware +configurations\ [#f3]_. + +.. [#f3] + + Incidentally, I think that SUN's approach to mounting CD-ROM s is very + good in origin: under Solaris a volume-daemon automatically mounts a + newly inserted CD-ROM under `/cdrom/**`. + + In my opinion they should have pushed this + further and have **every** CD-ROM on the local area network be + mounted at the similar location, i. e., no matter in which particular + machine you insert a CD-ROM, it will always appear at the same + position in the directory tree, on every system. When I wanted to + implement such a user-program for Linux, I came across the + differences in behavior of the various drivers, and the need for an + *ioctl* informing about media changes. + +We believe that using *O_NONBLOCK* to indicate that a device is being opened +for *ioctl* commands only can be easily introduced in the Linux +community. All the CD-player authors will have to be informed, we can +even send in our own patches to the programs. The use of *O_NONBLOCK* +has most likely no influence on the behavior of the CD-players on +other operating systems than Linux. Finally, a user can always revert +to old behavior by a call to +*ioctl(file_descriptor, CDROM_CLEAR_OPTIONS, CDO_USE_FFLAGS)*. + +The preferred strategy of *open()* +---------------------------------- + +The routines in `cdrom.c` are designed in such a way that run-time +configuration of the behavior of CD-ROM devices (of **any** type) +can be carried out, by the *CDROM_SET/CLEAR_OPTIONS* *ioctls*. Thus, various +modes of operation can be set: + +`CDO_AUTO_CLOSE | CDO_USE_FFLAGS | CDO_LOCK` + This is the default setting. (With *CDO_CHECK_TYPE* it will be better, in + the future.) If the device is not yet opened by any other process, and if + the device is being opened for data (*O_NONBLOCK* is not set) and the + tray is found to be open, an attempt to close the tray is made. Then, + it is verified that a disc is in the drive and, if *CDO_CHECK_TYPE* is + set, that it contains tracks of type `data mode 1`. Only if all tests + are passed is the return value zero. The door is locked to prevent file + system corruption. If the drive is opened for audio (*O_NONBLOCK* is + set), no actions are taken and a value of 0 will be returned. + +`CDO_AUTO_CLOSE | CDO_AUTO_EJECT | CDO_LOCK` + This mimics the behavior of the current sbpcd-driver. The option flags are + ignored, the tray is closed on the first open, if necessary. Similarly, + the tray is opened on the last release, i. e., if a CD-ROM is unmounted, + it is automatically ejected, such that the user can replace it. + +We hope that these option can convince everybody (both driver +maintainers and user program developers) to adopt the new CD-ROM +driver scheme and option flag interpretation. + +Description of routines in `cdrom.c` +==================================== + +Only a few routines in `cdrom.c` are exported to the drivers. In this +new section we will discuss these, as well as the functions that `take +over' the CD-ROM interface to the kernel. The header file belonging +to `cdrom.c` is called `cdrom.h`. Formerly, some of the contents of this +file were placed in the file `ucdrom.h`, but this file has now been +merged back into `cdrom.h`. + +:: + + struct file_operations cdrom_fops + +The contents of this structure were described in cdrom_api_. +A pointer to this structure is assigned to the *fops* field +of the *struct gendisk*. + +:: + + int register_cdrom(struct cdrom_device_info *cdi) + +This function is used in about the same way one registers *cdrom_fops* +with the kernel, the device operations and information structures, +as described in cdrom_api_, should be registered with the +Uniform CD-ROM Driver:: + + register_cdrom(&_info); + + +This function returns zero upon success, and non-zero upon +failure. The structure *_info* should have a pointer to the +driver's *_dops*, as in:: + + struct cdrom_device_info _info = { + _dops; + ... + } + +Note that a driver must have one static structure, *_dops*, while +it may have as many structures *_info* as there are minor devices +active. *Register_cdrom()* builds a linked list from these. + + +:: + + void unregister_cdrom(struct cdrom_device_info *cdi) + +Unregistering device *cdi* with minor number *MINOR(cdi->dev)* removes +the minor device from the list. If it was the last registered minor for +the low-level driver, this disconnects the registered device-operation +routines from the CD-ROM interface. This function returns zero upon +success, and non-zero upon failure. + +:: + + int cdrom_open(struct inode * ip, struct file * fp) + +This function is not called directly by the low-level drivers, it is +listed in the standard *cdrom_fops*. If the VFS opens a file, this +function becomes active. A strategy is implemented in this routine, +taking care of all capabilities and options that are set in the +*cdrom_device_ops* connected to the device. Then, the program flow is +transferred to the device_dependent *open()* call. + +:: + + void cdrom_release(struct inode *ip, struct file *fp) + +This function implements the reverse-logic of *cdrom_open()*, and then +calls the device-dependent *release()* routine. When the use-count has +reached 0, the allocated buffers are flushed by calls to *sync_dev(dev)* +and *invalidate_buffers(dev)*. + + +.. _cdrom_ioctl: + +:: + + int cdrom_ioctl(struct inode *ip, struct file *fp, + unsigned int cmd, unsigned long arg) + +This function handles all the standard *ioctl* requests for CD-ROM +devices in a uniform way. The different calls fall into three +categories: *ioctl()'s* that can be directly implemented by device +operations, ones that are routed through the call *audio_ioctl()*, and +the remaining ones, that are presumable device-dependent. Generally, a +negative return value indicates an error. + +Directly implemented *ioctl()'s* +-------------------------------- + +The following `old` CD-ROM *ioctl()*\ 's are implemented by directly +calling device-operations in *cdrom_device_ops*, if implemented and +not masked: + +`CDROMMULTISESSION` + Requests the last session on a CD-ROM. +`CDROMEJECT` + Open tray. +`CDROMCLOSETRAY` + Close tray. +`CDROMEJECT_SW` + If *arg\not=0*, set behavior to auto-close (close + tray on first open) and auto-eject (eject on last release), otherwise + set behavior to non-moving on *open()* and *release()* calls. +`CDROM_GET_MCN` + Get the Media Catalog Number from a CD. + +*Ioctl*s routed through *audio_ioctl()* +--------------------------------------- + +The following set of *ioctl()'s* are all implemented through a call to +the *cdrom_fops* function *audio_ioctl()*. Memory checks and +allocation are performed in *cdrom_ioctl()*, and also sanitization of +address format (*CDROM_LBA*/*CDROM_MSF*) is done. + +`CDROMSUBCHNL` + Get sub-channel data in argument *arg* of type + `struct cdrom_subchnl *`. +`CDROMREADTOCHDR` + Read Table of Contents header, in *arg* of type + `struct cdrom_tochdr *`. +`CDROMREADTOCENTRY` + Read a Table of Contents entry in *arg* and specified by *arg* + of type `struct cdrom_tocentry *`. +`CDROMPLAYMSF` + Play audio fragment specified in Minute, Second, Frame format, + delimited by *arg* of type `struct cdrom_msf *`. +`CDROMPLAYTRKIND` + Play audio fragment in track-index format delimited by *arg* + of type `struct cdrom_ti *`. +`CDROMVOLCTRL` + Set volume specified by *arg* of type `struct cdrom_volctrl *`. +`CDROMVOLREAD` + Read volume into by *arg* of type `struct cdrom_volctrl *`. +`CDROMSTART` + Spin up disc. +`CDROMSTOP` + Stop playback of audio fragment. +`CDROMPAUSE` + Pause playback of audio fragment. +`CDROMRESUME` + Resume playing. + +New *ioctl()'s* in `cdrom.c` +---------------------------- + +The following *ioctl()'s* have been introduced to allow user programs to +control the behavior of individual CD-ROM devices. New *ioctl* +commands can be identified by the underscores in their names. + +`CDROM_SET_OPTIONS` + Set options specified by *arg*. Returns the option flag register + after modification. Use *arg = \rm0* for reading the current flags. +`CDROM_CLEAR_OPTIONS` + Clear options specified by *arg*. Returns the option flag register + after modification. +`CDROM_SELECT_SPEED` + Select head-rate speed of disc specified as by *arg* in units + of standard cdrom speed (176\,kB/sec raw data or + 150kB/sec file system data). The value 0 means `auto-select`, + i. e., play audio discs at real time and data discs at maximum speed. + The value *arg* is checked against the maximum head rate of the + drive found in the *cdrom_dops*. +`CDROM_SELECT_DISC` + Select disc numbered *arg* from a juke-box. + + First disc is numbered 0. The number *arg* is checked against the + maximum number of discs in the juke-box found in the *cdrom_dops*. +`CDROM_MEDIA_CHANGED` + Returns 1 if a disc has been changed since the last call. + Note that calls to *cdrom_media_changed* by the VFS are treated + by an independent queue, so both mechanisms will detect a + media change once. For juke-boxes, an extra argument *arg* + specifies the slot for which the information is given. The special + value *CDSL_CURRENT* requests that information about the currently + selected slot be returned. +`CDROM_DRIVE_STATUS` + Returns the status of the drive by a call to + *drive_status()*. Return values are defined in cdrom_drive_status_. + Note that this call doesn't return information on the + current playing activity of the drive; this can be polled through + an *ioctl* call to *CDROMSUBCHNL*. For juke-boxes, an extra argument + *arg* specifies the slot for which (possibly limited) information is + given. The special value *CDSL_CURRENT* requests that information + about the currently selected slot be returned. +`CDROM_DISC_STATUS` + Returns the type of the disc currently in the drive. + It should be viewed as a complement to *CDROM_DRIVE_STATUS*. + This *ioctl* can provide *some* information about the current + disc that is inserted in the drive. This functionality used to be + implemented in the low level drivers, but is now carried out + entirely in Uniform CD-ROM Driver. + + The history of development of the CD's use as a carrier medium for + various digital information has lead to many different disc types. + This *ioctl* is useful only in the case that CDs have \emph {only + one} type of data on them. While this is often the case, it is + also very common for CDs to have some tracks with data, and some + tracks with audio. Because this is an existing interface, rather + than fixing this interface by changing the assumptions it was made + under, thereby breaking all user applications that use this + function, the Uniform CD-ROM Driver implements this *ioctl* as + follows: If the CD in question has audio tracks on it, and it has + absolutely no CD-I, XA, or data tracks on it, it will be reported + as *CDS_AUDIO*. If it has both audio and data tracks, it will + return *CDS_MIXED*. If there are no audio tracks on the disc, and + if the CD in question has any CD-I tracks on it, it will be + reported as *CDS_XA_2_2*. Failing that, if the CD in question + has any XA tracks on it, it will be reported as *CDS_XA_2_1*. + Finally, if the CD in question has any data tracks on it, + it will be reported as a data CD (*CDS_DATA_1*). + + This *ioctl* can return:: + + CDS_NO_INFO /* no information available */ + CDS_NO_DISC /* no disc is inserted, or tray is opened */ + CDS_AUDIO /* Audio disc (2352 audio bytes/frame) */ + CDS_DATA_1 /* data disc, mode 1 (2048 user bytes/frame) */ + CDS_XA_2_1 /* mixed data (XA), mode 2, form 1 (2048 user bytes) */ + CDS_XA_2_2 /* mixed data (XA), mode 2, form 1 (2324 user bytes) */ + CDS_MIXED /* mixed audio/data disc */ + + For some information concerning frame layout of the various disc + types, see a recent version of `cdrom.h`. + +`CDROM_CHANGER_NSLOTS` + Returns the number of slots in a juke-box. +`CDROMRESET` + Reset the drive. +`CDROM_GET_CAPABILITY` + Returns the *capability* flags for the drive. Refer to section + cdrom_capabilities_ for more information on these flags. +`CDROM_LOCKDOOR` + Locks the door of the drive. `arg == 0` unlocks the door, + any other value locks it. +`CDROM_DEBUG` + Turns on debugging info. Only root is allowed to do this. + Same semantics as CDROM_LOCKDOOR. + + +Device dependent *ioctl()'s* +---------------------------- + +Finally, all other *ioctl()'s* are passed to the function *dev_ioctl()*, +if implemented. No memory allocation or verification is carried out. + +How to update your driver +========================= + +- Make a backup of your current driver. +- Get hold of the files `cdrom.c` and `cdrom.h`, they should be in + the directory tree that came with this documentation. +- Make sure you include `cdrom.h`. +- Change the 3rd argument of *register_blkdev* from `&_fops` + to `&cdrom_fops`. +- Just after that line, add the following to register with the Uniform + CD-ROM Driver:: + + register_cdrom(&_info);* + + Similarly, add a call to *unregister_cdrom()* at the appropriate place. +- Copy an example of the device-operations *struct* to your + source, e. g., from `cm206.c` *cm206_dops*, and change all + entries to names corresponding to your driver, or names you just + happen to like. If your driver doesn't support a certain function, + make the entry *NULL*. At the entry *capability* you should list all + capabilities your driver currently supports. If your driver + has a capability that is not listed, please send me a message. +- Copy the *cdrom_device_info* declaration from the same example + driver, and modify the entries according to your needs. If your + driver dynamically determines the capabilities of the hardware, this + structure should also be declared dynamically. +- Implement all functions in your `_dops` structure, + according to prototypes listed in `cdrom.h`, and specifications given + in cdrom_api_. Most likely you have already implemented + the code in a large part, and you will almost certainly need to adapt the + prototype and return values. +- Rename your `_ioctl()` function to *audio_ioctl* and + change the prototype a little. Remove entries listed in the first + part in cdrom_ioctl_, if your code was OK, these are + just calls to the routines you adapted in the previous step. +- You may remove all remaining memory checking code in the + *audio_ioctl()* function that deals with audio commands (these are + listed in the second part of cdrom_ioctl_. There is no + need for memory allocation either, so most *case*s in the *switch* + statement look similar to:: + + case CDROMREADTOCENTRY: + get_toc_entry\bigl((struct cdrom_tocentry *) arg); + +- All remaining *ioctl* cases must be moved to a separate + function, *_ioctl*, the device-dependent *ioctl()'s*. Note that + memory checking and allocation must be kept in this code! +- Change the prototypes of *_open()* and + *_release()*, and remove any strategic code (i. e., tray + movement, door locking, etc.). +- Try to recompile the drivers. We advise you to use modules, both + for `cdrom.o` and your driver, as debugging is much easier this + way. + +Thanks +====== + +Thanks to all the people involved. First, Erik Andersen, who has +taken over the torch in maintaining `cdrom.c` and integrating much +CD-ROM-related code in the 2.1-kernel. Thanks to Scott Snyder and +Gerd Knorr, who were the first to implement this interface for SCSI +and IDE-CD drivers and added many ideas for extension of the data +structures relative to kernel~2.0. Further thanks to Heiko Eißfeldt, +Thomas Quinot, Jon Tombs, Ken Pizzini, Eberhard Mönkeberg and Andrew Kroll, +the Linux CD-ROM device driver developers who were kind +enough to give suggestions and criticisms during the writing. Finally +of course, I want to thank Linus Torvalds for making this possible in +the first place. diff --git a/drivers/cdrom/cdrom.c b/drivers/cdrom/cdrom.c index 933268b8d6a5..5d1e0a4a7d84 100644 --- a/drivers/cdrom/cdrom.c +++ b/drivers/cdrom/cdrom.c @@ -7,7 +7,7 @@ License. See linux/COPYING for more information. Uniform CD-ROM driver for Linux. - See Documentation/cdrom/cdrom-standard.tex for usage information. + See Documentation/cdrom/cdrom-standard.txt for usage information. The routines in the file provide a uniform interface between the software that uses CD-ROMs and the various low-level drivers that From 8ea618899b6b4fbe97c8462e7d769867307de011 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Wed, 12 Jun 2019 14:52:40 -0300 Subject: [PATCH 071/129] docs: cdrom: convert docs to ReST and rename to *.rst The stuff there is almost already at ReST format. A conversion for them is trivial: just add a missing titles and fix some scape codes for them to match ReST syntax. While here, rename the cdrom-standard.txt, with was converted from LaTeX to ReST on the previous patch, and add it to the index file. At its new index.rst, let's add a :orphan: while this is not linked to the main index.rst file, in order to avoid build warnings. Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Jonathan Corbet --- ...{cdrom-standard.txt => cdrom-standard.rst} | 0 Documentation/cdrom/{ide-cd => ide-cd.rst} | 178 +++++++++--------- Documentation/cdrom/index.rst | 19 ++ ...{packet-writing.txt => packet-writing.rst} | 27 ++- MAINTAINERS | 2 +- drivers/block/Kconfig | 2 +- drivers/cdrom/cdrom.c | 2 +- drivers/ide/ide-cd.c | 2 +- 8 files changed, 131 insertions(+), 101 deletions(-) rename Documentation/cdrom/{cdrom-standard.txt => cdrom-standard.rst} (100%) rename Documentation/cdrom/{ide-cd => ide-cd.rst} (84%) create mode 100644 Documentation/cdrom/index.rst rename Documentation/cdrom/{packet-writing.txt => packet-writing.rst} (91%) diff --git a/Documentation/cdrom/cdrom-standard.txt b/Documentation/cdrom/cdrom-standard.rst similarity index 100% rename from Documentation/cdrom/cdrom-standard.txt rename to Documentation/cdrom/cdrom-standard.rst diff --git a/Documentation/cdrom/ide-cd b/Documentation/cdrom/ide-cd.rst similarity index 84% rename from Documentation/cdrom/ide-cd rename to Documentation/cdrom/ide-cd.rst index a5f2a7f1ff46..dadc94ef6b6c 100644 --- a/Documentation/cdrom/ide-cd +++ b/Documentation/cdrom/ide-cd.rst @@ -1,18 +1,20 @@ IDE-CD driver documentation -Originally by scott snyder (19 May 1996) -Carrying on the torch is: Erik Andersen -New maintainers (19 Oct 1998): Jens Axboe +=========================== + +:Originally by: scott snyder (19 May 1996) +:Carrying on the torch is: Erik Andersen +:New maintainers (19 Oct 1998): Jens Axboe 1. Introduction --------------- -The ide-cd driver should work with all ATAPI ver 1.2 to ATAPI 2.6 compliant +The ide-cd driver should work with all ATAPI ver 1.2 to ATAPI 2.6 compliant CDROM drives which attach to an IDE interface. Note that some CDROM vendors (including Mitsumi, Sony, Creative, Aztech, and Goldstar) have made both ATAPI-compliant drives and drives which use a proprietary interface. If your drive uses one of those proprietary interfaces, this driver will not work with it (but one of the other CDROM drivers -probably will). This driver will not work with `ATAPI' drives which +probably will). This driver will not work with `ATAPI` drives which attach to the parallel port. In addition, there is at least one drive (CyCDROM CR520ie) which attaches to the IDE port but is not ATAPI; this driver will not work with drives like that either (but see the @@ -31,7 +33,7 @@ This driver provides the following features: from audio tracks. The program cdda2wav can be used for this. Note, however, that only some drives actually support this. - - There is now support for CDROM changers which comply with the + - There is now support for CDROM changers which comply with the ATAPI 2.6 draft standard (such as the NEC CDR-251). This additional functionality includes a function call to query which slot is the currently selected slot, a function call to query which slots contain @@ -49,11 +51,11 @@ This driver provides the following features: driver. 1. Make sure that the ide and ide-cd drivers are compiled into the - kernel you're using. When configuring the kernel, in the section - entitled "Floppy, IDE, and other block devices", say either `Y' - (which will compile the support directly into the kernel) or `M' + kernel you're using. When configuring the kernel, in the section + entitled "Floppy, IDE, and other block devices", say either `Y` + (which will compile the support directly into the kernel) or `M` (to compile support as a module which can be loaded and unloaded) - to the options: + to the options:: ATA/ATAPI/MFM/RLL support Include IDE/ATAPI CDROM support @@ -72,35 +74,35 @@ This driver provides the following features: address and an IRQ number, the standard assignments being 0x1f0 and 14 for the primary interface and 0x170 and 15 for the secondary interface. Each interface can control up to two devices, - where each device can be a hard drive, a CDROM drive, a floppy drive, - or a tape drive. The two devices on an interface are called `master' - and `slave'; this is usually selectable via a jumper on the drive. + where each device can be a hard drive, a CDROM drive, a floppy drive, + or a tape drive. The two devices on an interface are called `master` + and `slave`; this is usually selectable via a jumper on the drive. Linux names these devices as follows. The master and slave devices - on the primary IDE interface are called `hda' and `hdb', + on the primary IDE interface are called `hda` and `hdb`, respectively. The drives on the secondary interface are called - `hdc' and `hdd'. (Interfaces at other locations get other letters + `hdc` and `hdd`. (Interfaces at other locations get other letters in the third position; see Documentation/ide/ide.txt.) If you want your CDROM drive to be found automatically by the driver, you should make sure your IDE interface uses either the primary or secondary addresses mentioned above. In addition, if the CDROM drive is the only device on the IDE interface, it should - be jumpered as `master'. (If for some reason you cannot configure + be jumpered as `master`. (If for some reason you cannot configure your system in this manner, you can probably still use the driver. You may have to pass extra configuration information to the kernel when you boot, however. See Documentation/ide/ide.txt for more information.) 4. Boot the system. If the drive is recognized, you should see a - message which looks like + message which looks like:: hdb: NEC CD-ROM DRIVE:260, ATAPI CDROM drive If you do not see this, see section 5 below. 5. You may want to create a symbolic link /dev/cdrom pointing to the - actual device. You can do this with the command + actual device. You can do this with the command:: ln -s /dev/hdX /dev/cdrom @@ -108,14 +110,14 @@ This driver provides the following features: drive is installed. 6. You should be able to see any error messages from the driver with - the `dmesg' command. + the `dmesg` command. 3. Basic usage -------------- -An ISO 9660 CDROM can be mounted by putting the disc in the drive and -typing (as root) +An ISO 9660 CDROM can be mounted by putting the disc in the drive and +typing (as root):: mount -t iso9660 /dev/cdrom /mnt/cdrom @@ -123,7 +125,7 @@ where it is assumed that /dev/cdrom is a link pointing to the actual device (as described in step 5 of the last section) and /mnt/cdrom is an empty directory. You should now be able to see the contents of the CDROM under the /mnt/cdrom directory. If you want to eject the CDROM, -you must first dismount it with a command like +you must first dismount it with a command like:: umount /mnt/cdrom @@ -148,7 +150,7 @@ such as cdda2wav. The only types of drive which I've heard support this are Sony and Toshiba drives. You will get errors if you try to use this function on a drive which does not support it. -For supported changers, you can use the `cdchange' program (appended to +For supported changers, you can use the `cdchange` program (appended to the end of this file) to switch between changer slots. Note that the drive should be unmounted before attempting this. The program takes two arguments: the CDROM device, and the slot number to which you wish @@ -165,7 +167,7 @@ Documentation/ide/ide.txt for current information about the underlying IDE support code. Some of these items apply only to earlier versions of the driver, but are mentioned here for completeness. -In most cases, you should probably check with `dmesg' for any errors +In most cases, you should probably check with `dmesg` for any errors from the driver. a. Drive is not detected during booting. @@ -184,9 +186,9 @@ a. Drive is not detected during booting. - If the autoprobing is not finding your drive, you can tell the driver to assume that one exists by using a lilo option of the - form `hdX=cdrom', where X is the drive letter corresponding to - where your drive is installed. Note that if you do this and you - see a boot message like + form `hdX=cdrom`, where X is the drive letter corresponding to + where your drive is installed. Note that if you do this and you + see a boot message like:: hdX: ATAPI cdrom (?) @@ -220,7 +222,7 @@ b. Timeout/IRQ errors. probably not making it to the host. - IRQ problems may also be indicated by the message - `IRQ probe failed ()' while booting. If is zero, that + `IRQ probe failed ()` while booting. If is zero, that means that the system did not see an interrupt from the drive when it was expecting one (on any feasible IRQ). If is negative, that means the system saw interrupts on multiple IRQ lines, when @@ -240,27 +242,27 @@ b. Timeout/IRQ errors. there are hardware problems with the interrupt setup; they apparently don't use interrupts. - - If you own a Pioneer DR-A24X, you _will_ get nasty error messages + - If you own a Pioneer DR-A24X, you _will_ get nasty error messages on boot such as "irq timeout: status=0x50 { DriveReady SeekComplete }" The Pioneer DR-A24X CDROM drives are fairly popular these days. Unfortunately, these drives seem to become very confused when we perform the standard Linux ATA disk drive probe. If you own one of these drives, - you can bypass the ATA probing which confuses these CDROM drives, by - adding `append="hdX=noprobe hdX=cdrom"' to your lilo.conf file and running - lilo (again where X is the drive letter corresponding to where your drive + you can bypass the ATA probing which confuses these CDROM drives, by + adding `append="hdX=noprobe hdX=cdrom"` to your lilo.conf file and running + lilo (again where X is the drive letter corresponding to where your drive is installed.) - + c. System hangups. - If the system locks up when you try to access the CDROM, the most likely cause is that you have a buggy IDE adapter which doesn't properly handle simultaneous transactions on multiple interfaces. The most notorious of these is the CMD640B chip. This problem can - be worked around by specifying the `serialize' option when + be worked around by specifying the `serialize` option when booting. Recent kernels should be able to detect the need for this automatically in most cases, but the detection is not foolproof. See Documentation/ide/ide.txt for more information - about the `serialize' option and the CMD640B. + about the `serialize` option and the CMD640B. - Note that many MS-DOS CDROM drivers will work with such buggy hardware, apparently because they never attempt to overlap CDROM @@ -269,14 +271,14 @@ c. System hangups. d. Can't mount a CDROM. - - If you get errors from mount, it may help to check `dmesg' to see + - If you get errors from mount, it may help to check `dmesg` to see if there are any more specific errors from the driver or from the filesystem. - Make sure there's a CDROM loaded in the drive, and that's it's an ISO 9660 disc. You can't mount an audio CD. - - With the CDROM in the drive and unmounted, try something like + - With the CDROM in the drive and unmounted, try something like:: cat /dev/cdrom | od | more @@ -284,9 +286,9 @@ d. Can't mount a CDROM. OK, and the problem is at the filesystem level (i.e., the CDROM is not ISO 9660 or has errors in the filesystem structure). - - If you see `not a block device' errors, check that the definitions + - If you see `not a block device` errors, check that the definitions of the device special files are correct. They should be as - follows: + follows:: brw-rw---- 1 root disk 3, 0 Nov 11 18:48 /dev/hda brw-rw---- 1 root disk 3, 64 Nov 11 18:48 /dev/hdb @@ -301,7 +303,7 @@ d. Can't mount a CDROM. If you have a /dev/cdrom symbolic link, check that it is pointing to the correct device file. - If you hear people talking of the devices `hd1a' and `hd1b', these + If you hear people talking of the devices `hd1a` and `hd1b`, these were old names for what are now called hdc and hdd. Those names should be considered obsolete. @@ -311,8 +313,8 @@ d. Can't mount a CDROM. always give meaningful error messages. -e. Directory listings are unpredictably truncated, and `dmesg' shows - `buffer botch' error messages from the driver. +e. Directory listings are unpredictably truncated, and `dmesg` shows + `buffer botch` error messages from the driver. - There was a bug in the version of the driver in 1.2.x kernels which could cause this. It was fixed in 1.3.0. If you can't @@ -335,34 +337,36 @@ f. Data corruption. 5. cdchange.c ------------- -/* - * cdchange.c [-v] [] - * - * This loads a CDROM from a specified slot in a changer, and displays - * information about the changer status. The drive should be unmounted before - * using this program. - * - * Changer information is displayed if either the -v flag is specified - * or no slot was specified. - * - * Based on code originally from Gerhard Zuber . - * Changer status information, and rewrite for the new Uniform CDROM driver - * interface by Erik Andersen . - */ +:: -#include -#include -#include -#include -#include -#include -#include -#include + /* + * cdchange.c [-v] [] + * + * This loads a CDROM from a specified slot in a changer, and displays + * information about the changer status. The drive should be unmounted before + * using this program. + * + * Changer information is displayed if either the -v flag is specified + * or no slot was specified. + * + * Based on code originally from Gerhard Zuber . + * Changer status information, and rewrite for the new Uniform CDROM driver + * interface by Erik Andersen . + */ + + #include + #include + #include + #include + #include + #include + #include + #include -int -main (int argc, char **argv) -{ + int + main (int argc, char **argv) + { char *program; char *device; int fd; /* file descriptor for CD-ROM device */ @@ -382,30 +386,30 @@ main (int argc, char **argv) fprintf (stderr, " Slots are numbered 1 -- n.\n"); exit (1); } - + if (strcmp (argv[0], "-v") == 0) { verbose = 1; ++argv; --argc; } - + device = argv[0]; - + if (argc == 2) slot = atoi (argv[1]) - 1; - /* open device */ + /* open device */ fd = open(device, O_RDONLY | O_NONBLOCK); if (fd < 0) { - fprintf (stderr, "%s: open failed for `%s': %s\n", + fprintf (stderr, "%s: open failed for `%s`: %s\n", program, device, strerror (errno)); exit (1); } - /* Check CD player status */ + /* Check CD player status */ total_slots_available = ioctl (fd, CDROM_CHANGER_NSLOTS); if (total_slots_available <= 1 ) { - fprintf (stderr, "%s: Device `%s' is not an ATAPI " + fprintf (stderr, "%s: Device `%s` is not an ATAPI " "compliant CD changer.\n", program, device); exit (1); } @@ -418,7 +422,7 @@ main (int argc, char **argv) exit (1); } - /* load */ + /* load */ slot=ioctl (fd, CDROM_SELECT_DISC, slot); if (slot<0) { fflush(stdout); @@ -462,14 +466,14 @@ main (int argc, char **argv) for (x_slot=0; x_slot= -2KB on such a disc. For example, it should be possible to do: +2KB on such a disc. For example, it should be possible to do:: # dvd+rw-format /dev/hdc (only needed if the disc has never been formatted) @@ -54,7 +61,7 @@ follow the specification, but suffer bad performance problems if the writes are not 32KB aligned. Both problems can be solved by using the pktcdvd driver, which always -generates aligned writes. +generates aligned writes:: # dvd+rw-format /dev/hdc # pktsetup dev_name /dev/hdc @@ -83,7 +90,7 @@ Notes - Since the pktcdvd driver makes the disc appear as a regular block device with a 2KB block size, you can put any filesystem you like on - the disc. For example, run: + the disc. For example, run:: # /sbin/mke2fs /dev/pktcdvd/dev_name @@ -97,7 +104,7 @@ Since Linux 2.6.20, the pktcdvd module has a sysfs interface and can be controlled by it. For example the "pktcdvd" tool uses this interface. (see http://tom.ist-im-web.de/download/pktcdvd ) -"pktcdvd" works similar to "pktsetup", e.g.: +"pktcdvd" works similar to "pktsetup", e.g.:: # pktcdvd -a dev_name /dev/hdc # mkudffs /dev/pktcdvd/dev_name @@ -115,7 +122,7 @@ For a description of the sysfs interface look into the file: Using the pktcdvd debugfs interface ----------------------------------- -To read pktcdvd device infos in human readable form, do: +To read pktcdvd device infos in human readable form, do:: # cat /sys/kernel/debug/pktcdvd/pktcdvd[0-7]/info diff --git a/MAINTAINERS b/MAINTAINERS index 92eb34679b26..c95c29735327 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -7610,7 +7610,7 @@ IDE/ATAPI DRIVERS M: Borislav Petkov L: linux-ide@vger.kernel.org S: Maintained -F: Documentation/cdrom/ide-cd +F: Documentation/cdrom/ide-cd.rst F: drivers/ide/ide-cd* IDEAPAD LAPTOP EXTRAS DRIVER diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig index 20bb4bfa4be6..96ec7e0fc1ea 100644 --- a/drivers/block/Kconfig +++ b/drivers/block/Kconfig @@ -347,7 +347,7 @@ config CDROM_PKTCDVD is possible. DVD-RW disks must be in restricted overwrite mode. - See the file + See the file for further information on the use of this driver. To compile this driver as a module, choose M here: the diff --git a/drivers/cdrom/cdrom.c b/drivers/cdrom/cdrom.c index 5d1e0a4a7d84..ac42ae4651ce 100644 --- a/drivers/cdrom/cdrom.c +++ b/drivers/cdrom/cdrom.c @@ -7,7 +7,7 @@ License. See linux/COPYING for more information. Uniform CD-ROM driver for Linux. - See Documentation/cdrom/cdrom-standard.txt for usage information. + See Documentation/cdrom/cdrom-standard.rst for usage information. The routines in the file provide a uniform interface between the software that uses CD-ROMs and the various low-level drivers that diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c index 3b15adc6ce98..9d117936bee1 100644 --- a/drivers/ide/ide-cd.c +++ b/drivers/ide/ide-cd.c @@ -9,7 +9,7 @@ * May be copied or modified under the terms of the GNU General Public * License. See linux/COPYING for more information. * - * See Documentation/cdrom/ide-cd for usage information. + * See Documentation/cdrom/ide-cd.rst for usage information. * * Suggestions are welcome. Patches that work are more welcome though. ;-) * From f0ba43774cea3fc14732bb9243ce7238ae8a3202 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Wed, 12 Jun 2019 14:52:43 -0300 Subject: [PATCH 072/129] docs: convert docs to ReST and rename to *.rst The conversion is actually: - add blank lines and indentation in order to identify paragraphs; - fix tables markups; - add some lists markups; - mark literal blocks; - adjust title markups. At its new index.rst, let's add a :orphan: while this is not linked to the main index.rst file, in order to avoid build warnings. Signed-off-by: Mauro Carvalho Chehab Acked-by: Bjorn Helgaas Acked-by: Mark Brown Signed-off-by: Jonathan Corbet --- ...{cache-policies.txt => cache-policies.rst} | 24 +- .../device-mapper/{cache.txt => cache.rst} | 206 +++++++++------- .../device-mapper/{delay.txt => delay.rst} | 27 ++- .../{dm-crypt.txt => dm-crypt.rst} | 57 +++-- .../{dm-flakey.txt => dm-flakey.rst} | 45 ++-- .../{dm-init.txt => dm-init.rst} | 75 +++--- .../{dm-integrity.txt => dm-integrity.rst} | 62 +++-- .../device-mapper/{dm-io.txt => dm-io.rst} | 14 +- .../device-mapper/{dm-log.txt => dm-log.rst} | 5 +- ...m-queue-length.txt => dm-queue-length.rst} | 25 +- .../{dm-raid.txt => dm-raid.rst} | 225 +++++++++++------- ...m-service-time.txt => dm-service-time.rst} | 68 +++--- Documentation/device-mapper/dm-uevent.rst | 110 +++++++++ Documentation/device-mapper/dm-uevent.txt | 97 -------- .../{dm-zoned.txt => dm-zoned.rst} | 10 +- .../device-mapper/{era.txt => era.rst} | 36 +-- Documentation/device-mapper/index.rst | 44 ++++ .../device-mapper/{kcopyd.txt => kcopyd.rst} | 10 +- Documentation/device-mapper/linear.rst | 63 +++++ Documentation/device-mapper/linear.txt | 61 ----- .../{log-writes.txt => log-writes.rst} | 91 +++---- ...ersistent-data.txt => persistent-data.rst} | 4 + .../{snapshot.txt => snapshot.rst} | 116 ++++----- .../{statistics.txt => statistics.rst} | 62 ++--- Documentation/device-mapper/striped.rst | 61 +++++ Documentation/device-mapper/striped.txt | 57 ----- .../device-mapper/{switch.txt => switch.rst} | 47 ++-- ...provisioning.txt => thin-provisioning.rst} | 68 ++++-- .../{unstriped.txt => unstriped.rst} | 93 ++++---- .../device-mapper/{verity.txt => verity.rst} | 20 +- .../{writecache.txt => writecache.rst} | 13 +- .../device-mapper/{zero.txt => zero.rst} | 14 +- .../filesystems/ubifs-authentication.md | 4 +- drivers/md/Kconfig | 2 +- drivers/md/dm-init.c | 2 +- drivers/md/dm-raid.c | 2 +- 36 files changed, 1124 insertions(+), 796 deletions(-) rename Documentation/device-mapper/{cache-policies.txt => cache-policies.rst} (94%) rename Documentation/device-mapper/{cache.txt => cache.rst} (61%) rename Documentation/device-mapper/{delay.txt => delay.rst} (53%) rename Documentation/device-mapper/{dm-crypt.txt => dm-crypt.rst} (87%) rename Documentation/device-mapper/{dm-flakey.txt => dm-flakey.rst} (60%) rename Documentation/device-mapper/{dm-init.txt => dm-init.rst} (69%) rename Documentation/device-mapper/{dm-integrity.txt => dm-integrity.rst} (90%) rename Documentation/device-mapper/{dm-io.txt => dm-io.rst} (92%) rename Documentation/device-mapper/{dm-log.txt => dm-log.rst} (90%) rename Documentation/device-mapper/{dm-queue-length.txt => dm-queue-length.rst} (76%) rename Documentation/device-mapper/{dm-raid.txt => dm-raid.rst} (71%) rename Documentation/device-mapper/{dm-service-time.txt => dm-service-time.rst} (60%) create mode 100644 Documentation/device-mapper/dm-uevent.rst delete mode 100644 Documentation/device-mapper/dm-uevent.txt rename Documentation/device-mapper/{dm-zoned.txt => dm-zoned.rst} (97%) rename Documentation/device-mapper/{era.txt => era.rst} (70%) create mode 100644 Documentation/device-mapper/index.rst rename Documentation/device-mapper/{kcopyd.txt => kcopyd.rst} (93%) create mode 100644 Documentation/device-mapper/linear.rst delete mode 100644 Documentation/device-mapper/linear.txt rename Documentation/device-mapper/{log-writes.txt => log-writes.rst} (61%) rename Documentation/device-mapper/{persistent-data.txt => persistent-data.rst} (98%) rename Documentation/device-mapper/{snapshot.txt => snapshot.rst} (62%) rename Documentation/device-mapper/{statistics.txt => statistics.rst} (87%) create mode 100644 Documentation/device-mapper/striped.rst delete mode 100644 Documentation/device-mapper/striped.txt rename Documentation/device-mapper/{switch.txt => switch.rst} (84%) rename Documentation/device-mapper/{thin-provisioning.txt => thin-provisioning.rst} (92%) rename Documentation/device-mapper/{unstriped.txt => unstriped.rst} (60%) rename Documentation/device-mapper/{verity.txt => verity.rst} (98%) rename Documentation/device-mapper/{writecache.txt => writecache.rst} (96%) rename Documentation/device-mapper/{zero.txt => zero.rst} (83%) diff --git a/Documentation/device-mapper/cache-policies.txt b/Documentation/device-mapper/cache-policies.rst similarity index 94% rename from Documentation/device-mapper/cache-policies.txt rename to Documentation/device-mapper/cache-policies.rst index 86786d87d9a8..b17fe352fc41 100644 --- a/Documentation/device-mapper/cache-policies.txt +++ b/Documentation/device-mapper/cache-policies.rst @@ -1,3 +1,4 @@ +============================= Guidance for writing policies ============================= @@ -30,7 +31,7 @@ multiqueue (mq) This policy is now an alias for smq (see below). -The following tunables are accepted, but have no effect: +The following tunables are accepted, but have no effect:: 'sequential_threshold <#nr_sequential_ios>' 'random_threshold <#nr_random_ios>' @@ -56,7 +57,9 @@ mq policy's hints to be dropped. Also, performance of the cache may degrade slightly until smq recalculates the origin device's hotspots that should be cached. -Memory usage: +Memory usage +^^^^^^^^^^^^ + The mq policy used a lot of memory; 88 bytes per cache block on a 64 bit machine. @@ -69,7 +72,9 @@ cache block). All this means smq uses ~25bytes per cache block. Still a lot of memory, but a substantial improvement nontheless. -Level balancing: +Level balancing +^^^^^^^^^^^^^^^ + mq placed entries in different levels of the multiqueue structures based on their hit count (~ln(hit count)). This meant the bottom levels generally had the most entries, and the top ones had very @@ -94,7 +99,9 @@ is used to decide which blocks to promote. If the hotspot queue is performing badly then it starts moving entries more quickly between levels. This lets it adapt to new IO patterns very quickly. -Performance: +Performance +^^^^^^^^^^^ + Testing smq shows substantially better performance than mq. cleaner @@ -105,16 +112,19 @@ The cleaner writes back all dirty blocks in a cache to decommission it. Examples ======== -The syntax for a table is: +The syntax for a table is:: + cache <#feature_args> []* <#policy_args> []* -The syntax to send a message using the dmsetup command is: +The syntax to send a message using the dmsetup command is:: + dmsetup message 0 sequential_threshold 1024 dmsetup message 0 random_threshold 8 -Using dmsetup: +Using dmsetup:: + dmsetup create blah --table "0 268435456 cache /dev/sdb /dev/sdc \ /dev/sdd 512 0 mq 4 sequential_threshold 1024 random_threshold 8" creates a 128GB large mapped device named 'blah' with the diff --git a/Documentation/device-mapper/cache.txt b/Documentation/device-mapper/cache.rst similarity index 61% rename from Documentation/device-mapper/cache.txt rename to Documentation/device-mapper/cache.rst index 8ae1cf8e94da..f15e5254d05b 100644 --- a/Documentation/device-mapper/cache.txt +++ b/Documentation/device-mapper/cache.rst @@ -1,3 +1,7 @@ +===== +Cache +===== + Introduction ============ @@ -24,10 +28,13 @@ scenarios (eg. a vm image server). Glossary ======== - Migration - Movement of the primary copy of a logical block from one + Migration + Movement of the primary copy of a logical block from one device to the other. - Promotion - Migration from slow device to fast device. - Demotion - Migration from fast device to slow device. + Promotion + Migration from slow device to fast device. + Demotion + Migration from fast device to slow device. The origin device always contains a copy of the logical block, which may be out of date or kept in sync with the copy on the cache device @@ -169,45 +176,53 @@ Target interface Constructor ----------- - cache - <#feature args> []* - <#policy args> [policy args]* + :: - metadata dev : fast device holding the persistent metadata - cache dev : fast device holding cached data blocks - origin dev : slow device holding original data blocks - block size : cache unit size in sectors + cache + <#feature args> []* + <#policy args> [policy args]* - #feature args : number of feature arguments passed - feature args : writethrough or passthrough (The default is writeback.) + ================ ======================================================= + metadata dev fast device holding the persistent metadata + cache dev fast device holding cached data blocks + origin dev slow device holding original data blocks + block size cache unit size in sectors - policy : the replacement policy to use - #policy args : an even number of arguments corresponding to - key/value pairs passed to the policy - policy args : key/value pairs passed to the policy - E.g. 'sequential_threshold 1024' - See cache-policies.txt for details. + #feature args number of feature arguments passed + feature args writethrough or passthrough (The default is writeback.) + + policy the replacement policy to use + #policy args an even number of arguments corresponding to + key/value pairs passed to the policy + policy args key/value pairs passed to the policy + E.g. 'sequential_threshold 1024' + See cache-policies.txt for details. + ================ ======================================================= Optional feature arguments are: - writethrough : write through caching that prohibits cache block - content from being different from origin block content. - Without this argument, the default behaviour is to write - back cache block contents later for performance reasons, - so they may differ from the corresponding origin blocks. - passthrough : a degraded mode useful for various cache coherency - situations (e.g., rolling back snapshots of - underlying storage). Reads and writes always go to - the origin. If a write goes to a cached origin - block, then the cache block is invalidated. - To enable passthrough mode the cache must be clean. - metadata2 : use version 2 of the metadata. This stores the dirty bits - in a separate btree, which improves speed of shutting - down the cache. + ==================== ======================================================== + writethrough write through caching that prohibits cache block + content from being different from origin block content. + Without this argument, the default behaviour is to write + back cache block contents later for performance reasons, + so they may differ from the corresponding origin blocks. - no_discard_passdown : disable passing down discards from the cache - to the origin's data device. + passthrough a degraded mode useful for various cache coherency + situations (e.g., rolling back snapshots of + underlying storage). Reads and writes always go to + the origin. If a write goes to a cached origin + block, then the cache block is invalidated. + To enable passthrough mode the cache must be clean. + + metadata2 use version 2 of the metadata. This stores the dirty + bits in a separate btree, which improves speed of + shutting down the cache. + + no_discard_passdown disable passing down discards from the cache + to the origin's data device. + ==================== ======================================================== A policy called 'default' is always registered. This is an alias for the policy we currently think is giving best all round performance. @@ -218,54 +233,61 @@ the characteristics of a specific policy, always request it by name. Status ------ - <#used metadata blocks>/<#total metadata blocks> - <#used cache blocks>/<#total cache blocks> -<#read hits> <#read misses> <#write hits> <#write misses> -<#demotions> <#promotions> <#dirty> <#features> * -<#core args> * <#policy args> * - +:: -metadata block size : Fixed block size for each metadata block in - sectors -#used metadata blocks : Number of metadata blocks used -#total metadata blocks : Total number of metadata blocks -cache block size : Configurable block size for the cache device - in sectors -#used cache blocks : Number of blocks resident in the cache -#total cache blocks : Total number of cache blocks -#read hits : Number of times a READ bio has been mapped - to the cache -#read misses : Number of times a READ bio has been mapped - to the origin -#write hits : Number of times a WRITE bio has been mapped - to the cache -#write misses : Number of times a WRITE bio has been - mapped to the origin -#demotions : Number of times a block has been removed - from the cache -#promotions : Number of times a block has been moved to - the cache -#dirty : Number of blocks in the cache that differ - from the origin -#feature args : Number of feature args to follow -feature args : 'writethrough' (optional) -#core args : Number of core arguments (must be even) -core args : Key/value pairs for tuning the core - e.g. migration_threshold -policy name : Name of the policy -#policy args : Number of policy arguments to follow (must be even) -policy args : Key/value pairs e.g. sequential_threshold -cache metadata mode : ro if read-only, rw if read-write - In serious cases where even a read-only mode is deemed unsafe - no further I/O will be permitted and the status will just - contain the string 'Fail'. The userspace recovery tools - should then be used. -needs_check : 'needs_check' if set, '-' if not set - A metadata operation has failed, resulting in the needs_check - flag being set in the metadata's superblock. The metadata - device must be deactivated and checked/repaired before the - cache can be made fully operational again. '-' indicates - needs_check is not set. + <#used metadata blocks>/<#total metadata blocks> + <#used cache blocks>/<#total cache blocks> + <#read hits> <#read misses> <#write hits> <#write misses> + <#demotions> <#promotions> <#dirty> <#features> * + <#core args> * <#policy args> * + + + +========================= ===================================================== +metadata block size Fixed block size for each metadata block in + sectors +#used metadata blocks Number of metadata blocks used +#total metadata blocks Total number of metadata blocks +cache block size Configurable block size for the cache device + in sectors +#used cache blocks Number of blocks resident in the cache +#total cache blocks Total number of cache blocks +#read hits Number of times a READ bio has been mapped + to the cache +#read misses Number of times a READ bio has been mapped + to the origin +#write hits Number of times a WRITE bio has been mapped + to the cache +#write misses Number of times a WRITE bio has been + mapped to the origin +#demotions Number of times a block has been removed + from the cache +#promotions Number of times a block has been moved to + the cache +#dirty Number of blocks in the cache that differ + from the origin +#feature args Number of feature args to follow +feature args 'writethrough' (optional) +#core args Number of core arguments (must be even) +core args Key/value pairs for tuning the core + e.g. migration_threshold +policy name Name of the policy +#policy args Number of policy arguments to follow (must be even) +policy args Key/value pairs e.g. sequential_threshold +cache metadata mode ro if read-only, rw if read-write + + In serious cases where even a read-only mode is + deemed unsafe no further I/O will be permitted and + the status will just contain the string 'Fail'. + The userspace recovery tools should then be used. +needs_check 'needs_check' if set, '-' if not set + A metadata operation has failed, resulting in the + needs_check flag being set in the metadata's + superblock. The metadata device must be + deactivated and checked/repaired before the + cache can be made fully operational again. + '-' indicates needs_check is not set. +========================= ===================================================== Messages -------- @@ -274,11 +296,12 @@ Policies will have different tunables, specific to each one, so we need a generic way of getting and setting these. Device-mapper messages are used. (A sysfs interface would also be possible.) -The message format is: +The message format is:: -E.g. +E.g.:: + dmsetup message my_cache 0 sequential_threshold 1024 @@ -290,11 +313,12 @@ of values from 5 to 9. Each cblock must be expressed as a decimal value, in the future a variant message that takes cblock ranges expressed in hexadecimal may be needed to better support efficient invalidation of larger caches. The cache must be in passthrough mode -when invalidate_cblocks is used. +when invalidate_cblocks is used:: invalidate_cblocks [|-]* -E.g. +E.g.:: + dmsetup message my_cache 0 invalidate_cblocks 2345 3456-4567 5678-6789 Examples @@ -304,8 +328,10 @@ The test suite can be found here: https://github.com/jthornber/device-mapper-test-suite -dmsetup create my_cache --table '0 41943040 cache /dev/mapper/metadata \ - /dev/mapper/ssd /dev/mapper/origin 512 1 writeback default 0' -dmsetup create my_cache --table '0 41943040 cache /dev/mapper/metadata \ - /dev/mapper/ssd /dev/mapper/origin 1024 1 writeback \ - mq 4 sequential_threshold 1024 random_threshold 8' +:: + + dmsetup create my_cache --table '0 41943040 cache /dev/mapper/metadata \ + /dev/mapper/ssd /dev/mapper/origin 512 1 writeback default 0' + dmsetup create my_cache --table '0 41943040 cache /dev/mapper/metadata \ + /dev/mapper/ssd /dev/mapper/origin 1024 1 writeback \ + mq 4 sequential_threshold 1024 random_threshold 8' diff --git a/Documentation/device-mapper/delay.txt b/Documentation/device-mapper/delay.rst similarity index 53% rename from Documentation/device-mapper/delay.txt rename to Documentation/device-mapper/delay.rst index 6426c45273cb..917ba8c33359 100644 --- a/Documentation/device-mapper/delay.txt +++ b/Documentation/device-mapper/delay.rst @@ -1,10 +1,12 @@ +======== dm-delay ======== Device-Mapper's "delay" target delays reads and/or writes and maps them to different devices. -Parameters: +Parameters:: + [ [ ]] @@ -14,15 +16,16 @@ Delays are specified in milliseconds. Example scripts =============== -[[ -#!/bin/sh -# Create device delaying rw operation for 500ms -echo "0 `blockdev --getsz $1` delay $1 0 500" | dmsetup create delayed -]] -[[ -#!/bin/sh -# Create device delaying only write operation for 500ms and -# splitting reads and writes to different devices $1 $2 -echo "0 `blockdev --getsz $1` delay $1 0 0 $2 0 500" | dmsetup create delayed -]] +:: + + #!/bin/sh + # Create device delaying rw operation for 500ms + echo "0 `blockdev --getsz $1` delay $1 0 500" | dmsetup create delayed + +:: + + #!/bin/sh + # Create device delaying only write operation for 500ms and + # splitting reads and writes to different devices $1 $2 + echo "0 `blockdev --getsz $1` delay $1 0 0 $2 0 500" | dmsetup create delayed diff --git a/Documentation/device-mapper/dm-crypt.txt b/Documentation/device-mapper/dm-crypt.rst similarity index 87% rename from Documentation/device-mapper/dm-crypt.txt rename to Documentation/device-mapper/dm-crypt.rst index 3b3e1de21c9c..8f4a3f889d43 100644 --- a/Documentation/device-mapper/dm-crypt.txt +++ b/Documentation/device-mapper/dm-crypt.rst @@ -1,5 +1,6 @@ +======== dm-crypt -========= +======== Device-Mapper's "crypt" target provides transparent encryption of block devices using the kernel crypto API. @@ -7,15 +8,20 @@ using the kernel crypto API. For a more detailed description of supported parameters see: https://gitlab.com/cryptsetup/cryptsetup/wikis/DMCrypt -Parameters: \ +Parameters:: + + \ [<#opt_params> ] Encryption cipher, encryption mode and Initial Vector (IV) generator. - The cipher specifications format is: + The cipher specifications format is:: + cipher[:keycount]-chainmode-ivmode[:ivopts] - Examples: + + Examples:: + aes-cbc-essiv:sha256 aes-xts-plain64 serpent-xts-plain64 @@ -25,12 +31,17 @@ Parameters: \ as for the first format type. This format is mainly used for specification of authenticated modes. - The crypto API cipher specifications format is: + The crypto API cipher specifications format is:: + capi:cipher_api_spec-ivmode[:ivopts] - Examples: + + Examples:: + capi:cbc(aes)-essiv:sha256 capi:xts(aes)-plain64 - Examples of authenticated modes: + + Examples of authenticated modes:: + capi:gcm(aes)-random capi:authenc(hmac(sha256),xts(aes))-random capi:rfc7539(chacha20,poly1305)-random @@ -142,21 +153,21 @@ LUKS (Linux Unified Key Setup) is now the preferred way to set up disk encryption with dm-crypt using the 'cryptsetup' utility, see https://gitlab.com/cryptsetup/cryptsetup -[[ -#!/bin/sh -# Create a crypt device using dmsetup -dmsetup create crypt1 --table "0 `blockdev --getsz $1` crypt aes-cbc-essiv:sha256 babebabebabebabebabebabebabebabe 0 $1 0" -]] +:: -[[ -#!/bin/sh -# Create a crypt device using dmsetup when encryption key is stored in keyring service -dmsetup create crypt2 --table "0 `blockdev --getsize $1` crypt aes-cbc-essiv:sha256 :32:logon:my_prefix:my_key 0 $1 0" -]] + #!/bin/sh + # Create a crypt device using dmsetup + dmsetup create crypt1 --table "0 `blockdev --getsz $1` crypt aes-cbc-essiv:sha256 babebabebabebabebabebabebabebabe 0 $1 0" -[[ -#!/bin/sh -# Create a crypt device using cryptsetup and LUKS header with default cipher -cryptsetup luksFormat $1 -cryptsetup luksOpen $1 crypt1 -]] +:: + + #!/bin/sh + # Create a crypt device using dmsetup when encryption key is stored in keyring service + dmsetup create crypt2 --table "0 `blockdev --getsize $1` crypt aes-cbc-essiv:sha256 :32:logon:my_prefix:my_key 0 $1 0" + +:: + + #!/bin/sh + # Create a crypt device using cryptsetup and LUKS header with default cipher + cryptsetup luksFormat $1 + cryptsetup luksOpen $1 crypt1 diff --git a/Documentation/device-mapper/dm-flakey.txt b/Documentation/device-mapper/dm-flakey.rst similarity index 60% rename from Documentation/device-mapper/dm-flakey.txt rename to Documentation/device-mapper/dm-flakey.rst index 9f0e247d0877..86138735879d 100644 --- a/Documentation/device-mapper/dm-flakey.txt +++ b/Documentation/device-mapper/dm-flakey.rst @@ -1,3 +1,4 @@ +========= dm-flakey ========= @@ -15,17 +16,26 @@ underlying devices. Table parameters ---------------- + +:: + \ [ []] Mandatory parameters: - : Full pathname to the underlying block-device, or a - "major:minor" device-number. - : Starting sector within the device. - : Number of seconds device is available. - : Number of seconds device returns errors. + + : + Full pathname to the underlying block-device, or a + "major:minor" device-number. + : + Starting sector within the device. + : + Number of seconds device is available. + : + Number of seconds device returns errors. Optional feature parameters: + If no feature parameters are present, during the periods of unreliability, all I/O returns errors. @@ -41,17 +51,24 @@ Optional feature parameters: During , replace of the data of each matching bio with . - : The offset of the byte to replace. - Counting starts at 1, to replace the first byte. - : Either 'r' to corrupt reads or 'w' to corrupt writes. - 'w' is incompatible with drop_writes. - : The value (from 0-255) to write. - : Perform the replacement only if bio->bi_opf has all the - selected flags set. + : + The offset of the byte to replace. + Counting starts at 1, to replace the first byte. + : + Either 'r' to corrupt reads or 'w' to corrupt writes. + 'w' is incompatible with drop_writes. + : + The value (from 0-255) to write. + : + Perform the replacement only if bio->bi_opf has all the + selected flags set. Examples: + +Replaces the 32nd byte of READ bios with the value 1:: + corrupt_bio_byte 32 r 1 0 - - replaces the 32nd byte of READ bios with the value 1 + +Replaces the 224th byte of REQ_META (=32) bios with the value 0:: corrupt_bio_byte 224 w 0 32 - - replaces the 224th byte of REQ_META (=32) bios with the value 0 diff --git a/Documentation/device-mapper/dm-init.txt b/Documentation/device-mapper/dm-init.rst similarity index 69% rename from Documentation/device-mapper/dm-init.txt rename to Documentation/device-mapper/dm-init.rst index 130b3c3679c5..e5242ff17e9b 100644 --- a/Documentation/device-mapper/dm-init.txt +++ b/Documentation/device-mapper/dm-init.rst @@ -1,5 +1,6 @@ +================================ Early creation of mapped devices -==================================== +================================ It is possible to configure a device-mapper device to act as the root device for your system in two ways. @@ -12,15 +13,17 @@ The second is to create one or more device-mappers using the module parameter The format is specified as a string of data separated by commas and optionally semi-colons, where: + - a comma is used to separate fields like name, uuid, flags and table (specifies one device) - a semi-colon is used to separate devices. -So the format will look like this: +So the format will look like this:: dm-mod.create=,,,,[,
+][;,,,,
[,
+]+] -Where, +Where:: + ::= The device name. ::= xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | "" ::= The device minor number | "" @@ -29,7 +32,7 @@ Where, ::= "verity" | "linear" | ... (see list below) The dm line should be equivalent to the one used by the dmsetup tool with the ---concise argument. +`--concise` argument. Target types ============ @@ -38,32 +41,34 @@ Not all target types are available as there are serious risks in allowing activation of certain DM targets without first using userspace tools to check the validity of associated metadata. - "cache": constrained, userspace should verify cache device - "crypt": allowed - "delay": allowed - "era": constrained, userspace should verify metadata device - "flakey": constrained, meant for test - "linear": allowed - "log-writes": constrained, userspace should verify metadata device - "mirror": constrained, userspace should verify main/mirror device - "raid": constrained, userspace should verify metadata device - "snapshot": constrained, userspace should verify src/dst device - "snapshot-origin": allowed - "snapshot-merge": constrained, userspace should verify src/dst device - "striped": allowed - "switch": constrained, userspace should verify dev path - "thin": constrained, requires dm target message from userspace - "thin-pool": constrained, requires dm target message from userspace - "verity": allowed - "writecache": constrained, userspace should verify cache device - "zero": constrained, not meant for rootfs +======================= ======================================================= +`cache` constrained, userspace should verify cache device +`crypt` allowed +`delay` allowed +`era` constrained, userspace should verify metadata device +`flakey` constrained, meant for test +`linear` allowed +`log-writes` constrained, userspace should verify metadata device +`mirror` constrained, userspace should verify main/mirror device +`raid` constrained, userspace should verify metadata device +`snapshot` constrained, userspace should verify src/dst device +`snapshot-origin` allowed +`snapshot-merge` constrained, userspace should verify src/dst device +`striped` allowed +`switch` constrained, userspace should verify dev path +`thin` constrained, requires dm target message from userspace +`thin-pool` constrained, requires dm target message from userspace +`verity` allowed +`writecache` constrained, userspace should verify cache device +`zero` constrained, not meant for rootfs +======================= ======================================================= If the target is not listed above, it is constrained by default (not tested). Examples ======== An example of booting to a linear array made up of user-mode linux block -devices: +devices:: dm-mod.create="lroot,,,rw, 0 4096 linear 98:16 0, 4096 4096 linear 98:32 0" root=/dev/dm-0 @@ -71,8 +76,8 @@ This will boot to a rw dm-linear target of 8192 sectors split across two block devices identified by their major:minor numbers. After boot, udev will rename this target to /dev/mapper/lroot (depending on the rules). No uuid was assigned. -An example of multiple device-mappers, with the dm-mod.create="..." contents is shown here -split on multiple lines for readability: +An example of multiple device-mappers, with the dm-mod.create="..." contents +is shown here split on multiple lines for readability:: dm-linear,,1,rw, 0 32768 linear 8:1 0, @@ -84,30 +89,36 @@ split on multiple lines for readability: Other examples (per target): -"crypt": +"crypt":: + dm-crypt,,8,ro, 0 1048576 crypt aes-xts-plain64 babebabebabebabebabebabebabebabebabebabebabebabebabebabebabebabe 0 /dev/sda 0 1 allow_discards -"delay": +"delay":: + dm-delay,,4,ro,0 409600 delay /dev/sda1 0 500 -"linear": +"linear":: + dm-linear,,,rw, 0 32768 linear /dev/sda1 0, 32768 1024000 linear /dev/sda2 0, 1056768 204800 linear /dev/sda3 0, 1261568 512000 linear /dev/sda4 0 -"snapshot-origin": +"snapshot-origin":: + dm-snap-orig,,4,ro,0 409600 snapshot-origin 8:2 -"striped": +"striped":: + dm-striped,,4,ro,0 1638400 striped 4 4096 /dev/sda1 0 /dev/sda2 0 /dev/sda3 0 /dev/sda4 0 -"verity": +"verity":: + dm-verity,,4,ro, 0 1638400 verity 1 8:1 8:2 4096 4096 204800 1 sha256 fb1a5a0f00deb908d8b53cb270858975e76cf64105d412ce764225d53b8f3cfd diff --git a/Documentation/device-mapper/dm-integrity.txt b/Documentation/device-mapper/dm-integrity.rst similarity index 90% rename from Documentation/device-mapper/dm-integrity.txt rename to Documentation/device-mapper/dm-integrity.rst index d63d78ffeb73..a30aa91b5fbe 100644 --- a/Documentation/device-mapper/dm-integrity.txt +++ b/Documentation/device-mapper/dm-integrity.rst @@ -1,3 +1,7 @@ +============ +dm-integrity +============ + The dm-integrity target emulates a block device that has additional per-sector tags that can be used for storing integrity information. @@ -35,15 +39,16 @@ zeroes. If the superblock is neither valid nor zeroed, the dm-integrity target can't be loaded. To use the target for the first time: + 1. overwrite the superblock with zeroes 2. load the dm-integrity target with one-sector size, the kernel driver - will format the device + will format the device 3. unload the dm-integrity target 4. read the "provided_data_sectors" value from the superblock 5. load the dm-integrity target with the the target size - "provided_data_sectors" + "provided_data_sectors" 6. if you want to use dm-integrity with dm-crypt, load the dm-crypt target - with the size "provided_data_sectors" + with the size "provided_data_sectors" Target arguments: @@ -51,17 +56,20 @@ Target arguments: 1. the underlying block device 2. the number of reserved sector at the beginning of the device - the - dm-integrity won't read of write these sectors + dm-integrity won't read of write these sectors 3. the size of the integrity tag (if "-" is used, the size is taken from - the internal-hash algorithm) + the internal-hash algorithm) 4. mode: - D - direct writes (without journal) - in this mode, journaling is + + D - direct writes (without journal) + in this mode, journaling is not used and data sectors and integrity tags are written separately. In case of crash, it is possible that the data and integrity tag doesn't match. - J - journaled writes - data and integrity tags are written to the + J - journaled writes + data and integrity tags are written to the journal and atomicity is guaranteed. In case of crash, either both data and tag or none of them are written. The journaled mode degrades write throughput twice because the @@ -178,9 +186,12 @@ and the reloaded target would be non-functional. The layout of the formatted block device: -* reserved sectors (they are not used by this target, they can be used for - storing LUKS metadata or for other purpose), the size of the reserved - area is specified in the target arguments + +* reserved sectors + (they are not used by this target, they can be used for + storing LUKS metadata or for other purpose), the size of the reserved + area is specified in the target arguments + * superblock (4kiB) * magic string - identifies that the device was formatted * version @@ -192,40 +203,55 @@ The layout of the formatted block device: metadata and padding). The user of this target should not send bios that access data beyond the "provided data sectors" limit. * flags - SB_FLAG_HAVE_JOURNAL_MAC - a flag is set if journal_mac is used - SB_FLAG_RECALCULATING - recalculating is in progress - SB_FLAG_DIRTY_BITMAP - journal area contains the bitmap of dirty - blocks + SB_FLAG_HAVE_JOURNAL_MAC + - a flag is set if journal_mac is used + SB_FLAG_RECALCULATING + - recalculating is in progress + SB_FLAG_DIRTY_BITMAP + - journal area contains the bitmap of dirty + blocks * log2(sectors per block) * a position where recalculating finished * journal The journal is divided into sections, each section contains: + * metadata area (4kiB), it contains journal entries - every journal entry contains: + + - every journal entry contains: + * logical sector (specifies where the data and tag should be written) * last 8 bytes of data * integrity tag (the size is specified in the superblock) - every metadata sector ends with + + - every metadata sector ends with + * mac (8-bytes), all the macs in 8 metadata sectors form a 64-byte value. It is used to store hmac of sector numbers in the journal section, to protect against a possibility that the attacker tampers with sector numbers in the journal. * commit id + * data area (the size is variable; it depends on how many journal entries fit into the metadata area) - every sector in the data area contains: + + - every sector in the data area contains: + * data (504 bytes of data, the last 8 bytes are stored in the journal entry) * commit id + To test if the whole journal section was written correctly, every 512-byte sector of the journal ends with 8-byte commit id. If the commit id matches on all sectors in a journal section, then it is assumed that the section was written correctly. If the commit id doesn't match, the section was written partially and it should not be replayed. -* one or more runs of interleaved tags and data. Each run contains: + +* one or more runs of interleaved tags and data. + Each run contains: + * tag area - it contains integrity tags. There is one tag for each sector in the data area * data area - it contains data sectors. The number of data sectors diff --git a/Documentation/device-mapper/dm-io.txt b/Documentation/device-mapper/dm-io.rst similarity index 92% rename from Documentation/device-mapper/dm-io.txt rename to Documentation/device-mapper/dm-io.rst index 3b5d9a52cdcf..d2492917a1f5 100644 --- a/Documentation/device-mapper/dm-io.txt +++ b/Documentation/device-mapper/dm-io.rst @@ -1,3 +1,4 @@ +===== dm-io ===== @@ -7,7 +8,7 @@ version. The user must set up an io_region structure to describe the desired location of the I/O. Each io_region indicates a block-device along with the starting -sector and size of the region. +sector and size of the region:: struct io_region { struct block_device *bdev; @@ -19,7 +20,7 @@ Dm-io can read from one io_region or write to one or more io_regions. Writes to multiple regions are specified by an array of io_region structures. The first I/O service type takes a list of memory pages as the data buffer for -the I/O, along with an offset into the first page. +the I/O, along with an offset into the first page:: struct page_list { struct page_list *next; @@ -35,7 +36,7 @@ the I/O, along with an offset into the first page. The second I/O service type takes an array of bio vectors as the data buffer for the I/O. This service can be handy if the caller has a pre-assembled bio, -but wants to direct different portions of the bio to different devices. +but wants to direct different portions of the bio to different devices:: int dm_io_sync_bvec(unsigned int num_regions, struct io_region *where, int rw, struct bio_vec *bvec, @@ -47,7 +48,7 @@ but wants to direct different portions of the bio to different devices. The third I/O service type takes a pointer to a vmalloc'd memory buffer as the data buffer for the I/O. This service can be handy if the caller needs to do I/O to a large region but doesn't want to allocate a large number of individual -memory pages. +memory pages:: int dm_io_sync_vm(unsigned int num_regions, struct io_region *where, int rw, void *data, unsigned long *error_bits); @@ -55,11 +56,11 @@ memory pages. void *data, io_notify_fn fn, void *context); Callers of the asynchronous I/O services must include the name of a completion -callback routine and a pointer to some context data for the I/O. +callback routine and a pointer to some context data for the I/O:: typedef void (*io_notify_fn)(unsigned long error, void *context); -The "error" parameter in this callback, as well as the "*error" parameter in +The "error" parameter in this callback, as well as the `*error` parameter in all of the synchronous versions, is a bitset (instead of a simple error value). In the case of an write-I/O to multiple regions, this bitset allows dm-io to indicate success or failure on each individual region. @@ -72,4 +73,3 @@ always available in order to avoid unnecessary waiting while performing I/O. When the user is finished using the dm-io services, they should call dm_io_put() and specify the same number of pages that were given on the dm_io_get() call. - diff --git a/Documentation/device-mapper/dm-log.txt b/Documentation/device-mapper/dm-log.rst similarity index 90% rename from Documentation/device-mapper/dm-log.txt rename to Documentation/device-mapper/dm-log.rst index c155ac569c44..ba4fce39bc27 100644 --- a/Documentation/device-mapper/dm-log.txt +++ b/Documentation/device-mapper/dm-log.rst @@ -1,3 +1,4 @@ +===================== Device-Mapper Logging ===================== The device-mapper logging code is used by some of the device-mapper @@ -16,11 +17,13 @@ dm_dirty_log_type in include/linux/dm-dirty-log.h). Various different logging implementations are available and provide different capabilities. The list includes: +============== ============================================================== Type Files -==== ===== +============== ============================================================== disk drivers/md/dm-log.c core drivers/md/dm-log.c userspace drivers/md/dm-log-userspace* include/linux/dm-log-userspace.h +============== ============================================================== The "disk" log type ------------------- diff --git a/Documentation/device-mapper/dm-queue-length.txt b/Documentation/device-mapper/dm-queue-length.rst similarity index 76% rename from Documentation/device-mapper/dm-queue-length.txt rename to Documentation/device-mapper/dm-queue-length.rst index f4db2562175c..d8e381c1cb02 100644 --- a/Documentation/device-mapper/dm-queue-length.txt +++ b/Documentation/device-mapper/dm-queue-length.rst @@ -1,3 +1,4 @@ +=============== dm-queue-length =============== @@ -6,12 +7,18 @@ which selects a path with the least number of in-flight I/Os. The path selector name is 'queue-length'. Table parameters for each path: [] + +:: + : The number of I/Os to dispatch using the selected path before switching to the next path. If not given, internal default is used. To check the default value, see the activated table. Status for each path: + +:: + : 'A' if the path is active, 'F' if the path is failed. : The number of path failures. : The number of in-flight I/Os on the path. @@ -29,11 +36,13 @@ Examples ======== In case that 2 paths (sda and sdb) are used with repeat_count == 128. -# echo "0 10 multipath 0 0 1 1 queue-length 0 2 1 8:0 128 8:16 128" \ - dmsetup create test -# -# dmsetup table -test: 0 10 multipath 0 0 1 1 queue-length 0 2 1 8:0 128 8:16 128 -# -# dmsetup status -test: 0 10 multipath 2 0 0 0 1 1 E 0 2 1 8:0 A 0 0 8:16 A 0 0 +:: + + # echo "0 10 multipath 0 0 1 1 queue-length 0 2 1 8:0 128 8:16 128" \ + dmsetup create test + # + # dmsetup table + test: 0 10 multipath 0 0 1 1 queue-length 0 2 1 8:0 128 8:16 128 + # + # dmsetup status + test: 0 10 multipath 2 0 0 0 1 1 E 0 2 1 8:0 A 0 0 8:16 A 0 0 diff --git a/Documentation/device-mapper/dm-raid.txt b/Documentation/device-mapper/dm-raid.rst similarity index 71% rename from Documentation/device-mapper/dm-raid.txt rename to Documentation/device-mapper/dm-raid.rst index 2355bef14653..2fe255b130fb 100644 --- a/Documentation/device-mapper/dm-raid.txt +++ b/Documentation/device-mapper/dm-raid.rst @@ -1,3 +1,4 @@ +======= dm-raid ======= @@ -8,49 +9,66 @@ interface. Mapping Table Interface ----------------------- -The target is named "raid" and it accepts the following parameters: +The target is named "raid" and it accepts the following parameters:: <#raid_params> \ <#raid_devs> [.. ] : + + ============= =============================================================== raid0 RAID0 striping (no resilience) raid1 RAID1 mirroring raid4 RAID4 with dedicated last parity disk raid5_n RAID5 with dedicated last parity disk supporting takeover Same as raid4 - -Transitory layout + + - Transitory layout raid5_la RAID5 left asymmetric + - rotating parity 0 with data continuation raid5_ra RAID5 right asymmetric + - rotating parity N with data continuation raid5_ls RAID5 left symmetric + - rotating parity 0 with data restart raid5_rs RAID5 right symmetric + - rotating parity N with data restart raid6_zr RAID6 zero restart + - rotating parity zero (left-to-right) with data restart raid6_nr RAID6 N restart + - rotating parity N (right-to-left) with data restart raid6_nc RAID6 N continue + - rotating parity N (right-to-left) with data continuation raid6_n_6 RAID6 with dedicate parity disks + - parity and Q-syndrome on the last 2 disks; layout for takeover from/to raid4/raid5_n raid6_la_6 Same as "raid_la" plus dedicated last Q-syndrome disk + - layout for takeover from raid5_la from/to raid6 raid6_ra_6 Same as "raid5_ra" dedicated last Q-syndrome disk + - layout for takeover from raid5_ra from/to raid6 raid6_ls_6 Same as "raid5_ls" dedicated last Q-syndrome disk + - layout for takeover from raid5_ls from/to raid6 raid6_rs_6 Same as "raid5_rs" dedicated last Q-syndrome disk + - layout for takeover from raid5_rs from/to raid6 raid10 Various RAID10 inspired algorithms chosen by additional params (see raid10_format and raid10_copies below) + - RAID10: Striped Mirrors (aka 'Striping on top of mirrors') - RAID1E: Integrated Adjacent Stripe Mirroring - RAID1E: Integrated Offset Stripe Mirroring - - and other similar RAID10 variants + - and other similar RAID10 variants + ============= =============================================================== Reference: Chapter 4 of http://www.snia.org/sites/default/files/SNIA_DDF_Technical_Position_v2.0.pdf @@ -58,33 +76,41 @@ The target is named "raid" and it accepts the following parameters: <#raid_params>: The number of parameters that follow. consists of + Mandatory parameters: - : Chunk size in sectors. This parameter is often known as + : + Chunk size in sectors. This parameter is often known as "stripe size". It is the only mandatory parameter and is placed first. followed by optional parameters (in any order): - [sync|nosync] Force or prevent RAID initialization. + [sync|nosync] + Force or prevent RAID initialization. - [rebuild ] Rebuild drive number 'idx' (first drive is 0). + [rebuild ] + Rebuild drive number 'idx' (first drive is 0). [daemon_sleep ] Interval between runs of the bitmap daemon that clear bits. A longer interval means less bitmap I/O but resyncing after a failure is likely to take longer. - [min_recovery_rate ] Throttle RAID initialization - [max_recovery_rate ] Throttle RAID initialization - [write_mostly ] Mark drive index 'idx' write-mostly. - [max_write_behind ] See '--write-behind=' (man mdadm) - [stripe_cache ] Stripe cache size (RAID 4/5/6 only) + [min_recovery_rate ] + Throttle RAID initialization + [max_recovery_rate ] + Throttle RAID initialization + [write_mostly ] + Mark drive index 'idx' write-mostly. + [max_write_behind ] + See '--write-behind=' (man mdadm) + [stripe_cache ] + Stripe cache size (RAID 4/5/6 only) [region_size ] The region_size multiplied by the number of regions is the logical size of the array. The bitmap records the device synchronisation state for each region. - [raid10_copies <# copies>] - [raid10_format ] + [raid10_copies <# copies>], [raid10_format ] These two options are used to alter the default layout of a RAID10 configuration. The number of copies is can be specified, but the default is 2. There are also three @@ -93,13 +119,17 @@ The target is named "raid" and it accepts the following parameters: respect to mirroring. If these options are left unspecified, or 'raid10_copies 2' and/or 'raid10_format near' are given, then the layouts for 2, 3 and 4 devices are: + + ======== ========== ============== 2 drives 3 drives 4 drives - -------- ---------- -------------- + ======== ========== ============== A1 A1 A1 A1 A2 A1 A1 A2 A2 A2 A2 A2 A3 A3 A3 A3 A4 A4 A3 A3 A4 A4 A5 A5 A5 A6 A6 A4 A4 A5 A6 A6 A7 A7 A8 A8 .. .. .. .. .. .. .. .. .. + ======== ========== ============== + The 2-device layout is equivalent 2-way RAID1. The 4-device layout is what a traditional RAID10 would look like. The 3-device layout is what might be called a 'RAID1E - Integrated @@ -107,8 +137,10 @@ The target is named "raid" and it accepts the following parameters: If 'raid10_copies 2' and 'raid10_format far', then the layouts for 2, 3 and 4 devices are: + + ======== ============ =================== 2 drives 3 drives 4 drives - -------- -------------- -------------------- + ======== ============ =================== A1 A2 A1 A2 A3 A1 A2 A3 A4 A3 A4 A4 A5 A6 A5 A6 A7 A8 A5 A6 A7 A8 A9 A9 A10 A11 A12 @@ -117,11 +149,14 @@ The target is named "raid" and it accepts the following parameters: A4 A3 A6 A4 A5 A6 A5 A8 A7 A6 A5 A9 A7 A8 A10 A9 A12 A11 .. .. .. .. .. .. .. .. .. + ======== ============ =================== If 'raid10_copies 2' and 'raid10_format offset', then the layouts for 2, 3 and 4 devices are: + + ======== ========== ================ 2 drives 3 drives 4 drives - -------- ------------ ----------------- + ======== ========== ================ A1 A2 A1 A2 A3 A1 A2 A3 A4 A2 A1 A3 A1 A2 A2 A1 A4 A3 A3 A4 A4 A5 A6 A5 A6 A7 A8 @@ -129,6 +164,8 @@ The target is named "raid" and it accepts the following parameters: A5 A6 A7 A8 A9 A9 A10 A11 A12 A6 A5 A9 A7 A8 A10 A9 A12 A11 .. .. .. .. .. .. .. .. .. + ======== ========== ================ + Here we see layouts closely akin to 'RAID1E - Integrated Offset Stripe Mirroring'. @@ -190,22 +227,25 @@ The target is named "raid" and it accepts the following parameters: Example Tables -------------- -# RAID4 - 4 data drives, 1 parity (no metadata devices) -# No metadata devices specified to hold superblock/bitmap info -# Chunk size of 1MiB -# (Lines separated for easy reading) -0 1960893648 raid \ - raid4 1 2048 \ - 5 - 8:17 - 8:33 - 8:49 - 8:65 - 8:81 +:: -# RAID4 - 4 data drives, 1 parity (with metadata devices) -# Chunk size of 1MiB, force RAID initialization, -# min recovery rate at 20 kiB/sec/disk + # RAID4 - 4 data drives, 1 parity (no metadata devices) + # No metadata devices specified to hold superblock/bitmap info + # Chunk size of 1MiB + # (Lines separated for easy reading) -0 1960893648 raid \ - raid4 4 2048 sync min_recovery_rate 20 \ - 5 8:17 8:18 8:33 8:34 8:49 8:50 8:65 8:66 8:81 8:82 + 0 1960893648 raid \ + raid4 1 2048 \ + 5 - 8:17 - 8:33 - 8:49 - 8:65 - 8:81 + + # RAID4 - 4 data drives, 1 parity (with metadata devices) + # Chunk size of 1MiB, force RAID initialization, + # min recovery rate at 20 kiB/sec/disk + + 0 1960893648 raid \ + raid4 4 2048 sync min_recovery_rate 20 \ + 5 8:17 8:18 8:33 8:34 8:49 8:50 8:65 8:66 8:81 8:82 Status Output @@ -219,41 +259,58 @@ Arguments that can be repeated are ordered by value. 'dmsetup status' yields information on the state and health of the array. The output is as follows (normally a single line, but expanded here for -clarity): -1: raid \ -2: <#devices> \ -3: +clarity):: + + 1: raid \ + 2: <#devices> \ + 3: Line 1 is the standard output produced by device-mapper. -Line 2 & 3 are produced by the raid target and are best explained by example: + +Line 2 & 3 are produced by the raid target and are best explained by example:: + 0 1960893648 raid raid4 5 AAAAA 2/490221568 init 0 + Here we can see the RAID type is raid4, there are 5 devices - all of which are 'A'live, and the array is 2/490221568 complete with its initial recovery. Here is a fuller description of the individual fields: + + =============== ========================================================= Same as the used to create the array. - One char for each device, indicating: 'A' = alive and - in-sync, 'a' = alive but not in-sync, 'D' = dead/failed. + One char for each device, indicating: + + - 'A' = alive and in-sync + - 'a' = alive but not in-sync + - 'D' = dead/failed. The ratio indicating how much of the array has undergone the process described by 'sync_action'. If the 'sync_action' is "check" or "repair", then the process of "resync" or "recover" can be considered complete. One of the following possible states: - idle - No synchronization action is being performed. - frozen - The current action has been halted. - resync - Array is undergoing its initial synchronization + + idle + - No synchronization action is being performed. + frozen + - The current action has been halted. + resync + - Array is undergoing its initial synchronization or is resynchronizing after an unclean shutdown (possibly aided by a bitmap). - recover - A device in the array is being rebuilt or + recover + - A device in the array is being rebuilt or replaced. - check - A user-initiated full check of the array is + check + - A user-initiated full check of the array is being performed. All blocks are read and checked for consistency. The number of discrepancies found are recorded in . No changes are made to the array by this action. - repair - The same as "check", but discrepancies are + repair + - The same as "check", but discrepancies are corrected. - reshape - The array is undergoing a reshape. + reshape + - The array is undergoing a reshape. The number of discrepancies found between mirror copies in RAID1/10 or wrong parity values found in RAID4/5/6. This value is valid only after a "check" of the array @@ -261,10 +318,11 @@ recovery. Here is a fuller description of the individual fields: The current data offset to the start of the user data on each component device of a raid set (see the respective raid parameter to support out-of-place reshaping). - 'A' - active write-through journal device. - 'a' - active write-back journal device. - 'D' - dead journal device. - '-' - no journal device. + - 'A' - active write-through journal device. + - 'a' - active write-back journal device. + - 'D' - dead journal device. + - '-' - no journal device. + =============== ========================================================= Message Interface @@ -272,12 +330,15 @@ Message Interface The dm-raid target will accept certain actions through the 'message' interface. ('man dmsetup' for more information on the message interface.) These actions include: - "idle" - Halt the current sync action. - "frozen" - Freeze the current sync action. - "resync" - Initiate/continue a resync. - "recover"- Initiate/continue a recover process. - "check" - Initiate a check (i.e. a "scrub") of the array. - "repair" - Initiate a repair of the array. + + ========= ================================================ + "idle" Halt the current sync action. + "frozen" Freeze the current sync action. + "resync" Initiate/continue a resync. + "recover" Initiate/continue a recover process. + "check" Initiate a check (i.e. a "scrub") of the array. + "repair" Initiate a repair of the array. + ========= ================================================ Discard Support @@ -307,48 +368,52 @@ increasingly whitelisted in the kernel and can thus be trusted. For trusted devices, the following dm-raid module parameter can be set to safely enable discard support for RAID 4/5/6: + 'devices_handle_discards_safely' Version History --------------- -1.0.0 Initial version. Support for RAID 4/5/6 -1.1.0 Added support for RAID 1 -1.2.0 Handle creation of arrays that contain failed devices. -1.3.0 Added support for RAID 10 -1.3.1 Allow device replacement/rebuild for RAID 10 -1.3.2 Fix/improve redundancy checking for RAID10 -1.4.0 Non-functional change. Removes arg from mapping function. -1.4.1 RAID10 fix redundancy validation checks (commit 55ebbb5). -1.4.2 Add RAID10 "far" and "offset" algorithm support. -1.5.0 Add message interface to allow manipulation of the sync_action. + +:: + + 1.0.0 Initial version. Support for RAID 4/5/6 + 1.1.0 Added support for RAID 1 + 1.2.0 Handle creation of arrays that contain failed devices. + 1.3.0 Added support for RAID 10 + 1.3.1 Allow device replacement/rebuild for RAID 10 + 1.3.2 Fix/improve redundancy checking for RAID10 + 1.4.0 Non-functional change. Removes arg from mapping function. + 1.4.1 RAID10 fix redundancy validation checks (commit 55ebbb5). + 1.4.2 Add RAID10 "far" and "offset" algorithm support. + 1.5.0 Add message interface to allow manipulation of the sync_action. New status (STATUSTYPE_INFO) fields: sync_action and mismatch_cnt. -1.5.1 Add ability to restore transiently failed devices on resume. -1.5.2 'mismatch_cnt' is zero unless [last_]sync_action is "check". -1.6.0 Add discard support (and devices_handle_discard_safely module param). -1.7.0 Add support for MD RAID0 mappings. -1.8.0 Explicitly check for compatible flags in the superblock metadata + 1.5.1 Add ability to restore transiently failed devices on resume. + 1.5.2 'mismatch_cnt' is zero unless [last_]sync_action is "check". + 1.6.0 Add discard support (and devices_handle_discard_safely module param). + 1.7.0 Add support for MD RAID0 mappings. + 1.8.0 Explicitly check for compatible flags in the superblock metadata and reject to start the raid set if any are set by a newer target version, thus avoiding data corruption on a raid set with a reshape in progress. -1.9.0 Add support for RAID level takeover/reshape/region size + 1.9.0 Add support for RAID level takeover/reshape/region size and set size reduction. -1.9.1 Fix activation of existing RAID 4/10 mapped devices -1.9.2 Don't emit '- -' on the status table line in case the constructor + 1.9.1 Fix activation of existing RAID 4/10 mapped devices + 1.9.2 Don't emit '- -' on the status table line in case the constructor fails reading a superblock. Correctly emit 'maj:min1 maj:min2' and 'D' on the status line. If '- -' is passed into the constructor, emit '- -' on the table line and '-' as the status line health character. -1.10.0 Add support for raid4/5/6 journal device -1.10.1 Fix data corruption on reshape request -1.11.0 Fix table line argument order + 1.10.0 Add support for raid4/5/6 journal device + 1.10.1 Fix data corruption on reshape request + 1.11.0 Fix table line argument order (wrong raid10_copies/raid10_format sequence) -1.11.1 Add raid4/5/6 journal write-back support via journal_mode option -1.12.1 Fix for MD deadlock between mddev_suspend() and md_write_start() available -1.13.0 Fix dev_health status at end of "recover" (was 'a', now 'A') -1.13.1 Fix deadlock caused by early md_stop_writes(). Also fix size an + 1.11.1 Add raid4/5/6 journal write-back support via journal_mode option + 1.12.1 Fix for MD deadlock between mddev_suspend() and md_write_start() available + 1.13.0 Fix dev_health status at end of "recover" (was 'a', now 'A') + 1.13.1 Fix deadlock caused by early md_stop_writes(). Also fix size an state races. -1.13.2 Fix raid redundancy validation and avoid keeping raid set frozen -1.14.0 Fix reshape race on small devices. Fix stripe adding reshape + 1.13.2 Fix raid redundancy validation and avoid keeping raid set frozen + 1.14.0 Fix reshape race on small devices. Fix stripe adding reshape deadlock/potential data corruption. Update superblock when specific devices are requested via rebuild. Fix RAID leg rebuild errors. diff --git a/Documentation/device-mapper/dm-service-time.txt b/Documentation/device-mapper/dm-service-time.rst similarity index 60% rename from Documentation/device-mapper/dm-service-time.txt rename to Documentation/device-mapper/dm-service-time.rst index fb1d4a0cf122..facf277fc13c 100644 --- a/Documentation/device-mapper/dm-service-time.txt +++ b/Documentation/device-mapper/dm-service-time.rst @@ -1,3 +1,4 @@ +=============== dm-service-time =============== @@ -12,25 +13,34 @@ in a path-group, and it can be specified as a table argument. The path selector name is 'service-time'. -Table parameters for each path: [ []] - : The number of I/Os to dispatch using the selected +Table parameters for each path: + + [ []] + : + The number of I/Os to dispatch using the selected path before switching to the next path. If not given, internal default is used. To check the default value, see the activated table. - : The relative throughput value of the path + : + The relative throughput value of the path among all paths in the path-group. The valid range is 0-100. If not given, minimum value '1' is used. If '0' is given, the path isn't selected while other paths having a positive value are available. -Status for each path: \ - - : 'A' if the path is active, 'F' if the path is failed. - : The number of path failures. - : The size of in-flight I/Os on the path. - : The relative throughput value of the path - among all paths in the path-group. +Status for each path: + + + : + 'A' if the path is active, 'F' if the path is failed. + : + The number of path failures. + : + The size of in-flight I/Os on the path. + : + The relative throughput value of the path + among all paths in the path-group. Algorithm @@ -39,7 +49,7 @@ Algorithm dm-service-time adds the I/O size to 'in-flight-size' when the I/O is dispatched and subtracts when completed. Basically, dm-service-time selects a path having minimum service time -which is calculated by: +which is calculated by:: ('in-flight-size' + 'size-of-incoming-io') / 'relative_throughput' @@ -67,25 +77,25 @@ Examples ======== In case that 2 paths (sda and sdb) are used with repeat_count == 128 and sda has an average throughput 1GB/s and sdb has 4GB/s, -'relative_throughput' value may be '1' for sda and '4' for sdb. +'relative_throughput' value may be '1' for sda and '4' for sdb:: -# echo "0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 1 8:16 128 4" \ - dmsetup create test -# -# dmsetup table -test: 0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 1 8:16 128 4 -# -# dmsetup status -test: 0 10 multipath 2 0 0 0 1 1 E 0 2 2 8:0 A 0 0 1 8:16 A 0 0 4 + # echo "0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 1 8:16 128 4" \ + dmsetup create test + # + # dmsetup table + test: 0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 1 8:16 128 4 + # + # dmsetup status + test: 0 10 multipath 2 0 0 0 1 1 E 0 2 2 8:0 A 0 0 1 8:16 A 0 0 4 -Or '2' for sda and '8' for sdb would be also true. +Or '2' for sda and '8' for sdb would be also true:: -# echo "0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 2 8:16 128 8" \ - dmsetup create test -# -# dmsetup table -test: 0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 2 8:16 128 8 -# -# dmsetup status -test: 0 10 multipath 2 0 0 0 1 1 E 0 2 2 8:0 A 0 0 2 8:16 A 0 0 8 + # echo "0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 2 8:16 128 8" \ + dmsetup create test + # + # dmsetup table + test: 0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 2 8:16 128 8 + # + # dmsetup status + test: 0 10 multipath 2 0 0 0 1 1 E 0 2 2 8:0 A 0 0 2 8:16 A 0 0 8 diff --git a/Documentation/device-mapper/dm-uevent.rst b/Documentation/device-mapper/dm-uevent.rst new file mode 100644 index 000000000000..4a8ee8d069c9 --- /dev/null +++ b/Documentation/device-mapper/dm-uevent.rst @@ -0,0 +1,110 @@ +==================== +device-mapper uevent +==================== + +The device-mapper uevent code adds the capability to device-mapper to create +and send kobject uevents (uevents). Previously device-mapper events were only +available through the ioctl interface. The advantage of the uevents interface +is the event contains environment attributes providing increased context for +the event avoiding the need to query the state of the device-mapper device after +the event is received. + +There are two functions currently for device-mapper events. The first function +listed creates the event and the second function sends the event(s):: + + void dm_path_uevent(enum dm_uevent_type event_type, struct dm_target *ti, + const char *path, unsigned nr_valid_paths) + + void dm_send_uevents(struct list_head *events, struct kobject *kobj) + + +The variables added to the uevent environment are: + +Variable Name: DM_TARGET +------------------------ +:Uevent Action(s): KOBJ_CHANGE +:Type: string +:Description: +:Value: Name of device-mapper target that generated the event. + +Variable Name: DM_ACTION +------------------------ +:Uevent Action(s): KOBJ_CHANGE +:Type: string +:Description: +:Value: Device-mapper specific action that caused the uevent action. + PATH_FAILED - A path has failed; + PATH_REINSTATED - A path has been reinstated. + +Variable Name: DM_SEQNUM +------------------------ +:Uevent Action(s): KOBJ_CHANGE +:Type: unsigned integer +:Description: A sequence number for this specific device-mapper device. +:Value: Valid unsigned integer range. + +Variable Name: DM_PATH +---------------------- +:Uevent Action(s): KOBJ_CHANGE +:Type: string +:Description: Major and minor number of the path device pertaining to this + event. +:Value: Path name in the form of "Major:Minor" + +Variable Name: DM_NR_VALID_PATHS +-------------------------------- +:Uevent Action(s): KOBJ_CHANGE +:Type: unsigned integer +:Description: +:Value: Valid unsigned integer range. + +Variable Name: DM_NAME +---------------------- +:Uevent Action(s): KOBJ_CHANGE +:Type: string +:Description: Name of the device-mapper device. +:Value: Name + +Variable Name: DM_UUID +---------------------- +:Uevent Action(s): KOBJ_CHANGE +:Type: string +:Description: UUID of the device-mapper device. +:Value: UUID. (Empty string if there isn't one.) + +An example of the uevents generated as captured by udevmonitor is shown +below + +1.) Path failure:: + + UEVENT[1192521009.711215] change@/block/dm-3 + ACTION=change + DEVPATH=/block/dm-3 + SUBSYSTEM=block + DM_TARGET=multipath + DM_ACTION=PATH_FAILED + DM_SEQNUM=1 + DM_PATH=8:32 + DM_NR_VALID_PATHS=0 + DM_NAME=mpath2 + DM_UUID=mpath-35333333000002328 + MINOR=3 + MAJOR=253 + SEQNUM=1130 + +2.) Path reinstate:: + + UEVENT[1192521132.989927] change@/block/dm-3 + ACTION=change + DEVPATH=/block/dm-3 + SUBSYSTEM=block + DM_TARGET=multipath + DM_ACTION=PATH_REINSTATED + DM_SEQNUM=2 + DM_PATH=8:32 + DM_NR_VALID_PATHS=1 + DM_NAME=mpath2 + DM_UUID=mpath-35333333000002328 + MINOR=3 + MAJOR=253 + SEQNUM=1131 diff --git a/Documentation/device-mapper/dm-uevent.txt b/Documentation/device-mapper/dm-uevent.txt deleted file mode 100644 index 07edbd85c714..000000000000 --- a/Documentation/device-mapper/dm-uevent.txt +++ /dev/null @@ -1,97 +0,0 @@ -The device-mapper uevent code adds the capability to device-mapper to create -and send kobject uevents (uevents). Previously device-mapper events were only -available through the ioctl interface. The advantage of the uevents interface -is the event contains environment attributes providing increased context for -the event avoiding the need to query the state of the device-mapper device after -the event is received. - -There are two functions currently for device-mapper events. The first function -listed creates the event and the second function sends the event(s). - -void dm_path_uevent(enum dm_uevent_type event_type, struct dm_target *ti, - const char *path, unsigned nr_valid_paths) - -void dm_send_uevents(struct list_head *events, struct kobject *kobj) - - -The variables added to the uevent environment are: - -Variable Name: DM_TARGET -Uevent Action(s): KOBJ_CHANGE -Type: string -Description: -Value: Name of device-mapper target that generated the event. - -Variable Name: DM_ACTION -Uevent Action(s): KOBJ_CHANGE -Type: string -Description: -Value: Device-mapper specific action that caused the uevent action. - PATH_FAILED - A path has failed. - PATH_REINSTATED - A path has been reinstated. - -Variable Name: DM_SEQNUM -Uevent Action(s): KOBJ_CHANGE -Type: unsigned integer -Description: A sequence number for this specific device-mapper device. -Value: Valid unsigned integer range. - -Variable Name: DM_PATH -Uevent Action(s): KOBJ_CHANGE -Type: string -Description: Major and minor number of the path device pertaining to this -event. -Value: Path name in the form of "Major:Minor" - -Variable Name: DM_NR_VALID_PATHS -Uevent Action(s): KOBJ_CHANGE -Type: unsigned integer -Description: -Value: Valid unsigned integer range. - -Variable Name: DM_NAME -Uevent Action(s): KOBJ_CHANGE -Type: string -Description: Name of the device-mapper device. -Value: Name - -Variable Name: DM_UUID -Uevent Action(s): KOBJ_CHANGE -Type: string -Description: UUID of the device-mapper device. -Value: UUID. (Empty string if there isn't one.) - -An example of the uevents generated as captured by udevmonitor is shown -below. - -1.) Path failure. -UEVENT[1192521009.711215] change@/block/dm-3 -ACTION=change -DEVPATH=/block/dm-3 -SUBSYSTEM=block -DM_TARGET=multipath -DM_ACTION=PATH_FAILED -DM_SEQNUM=1 -DM_PATH=8:32 -DM_NR_VALID_PATHS=0 -DM_NAME=mpath2 -DM_UUID=mpath-35333333000002328 -MINOR=3 -MAJOR=253 -SEQNUM=1130 - -2.) Path reinstate. -UEVENT[1192521132.989927] change@/block/dm-3 -ACTION=change -DEVPATH=/block/dm-3 -SUBSYSTEM=block -DM_TARGET=multipath -DM_ACTION=PATH_REINSTATED -DM_SEQNUM=2 -DM_PATH=8:32 -DM_NR_VALID_PATHS=1 -DM_NAME=mpath2 -DM_UUID=mpath-35333333000002328 -MINOR=3 -MAJOR=253 -SEQNUM=1131 diff --git a/Documentation/device-mapper/dm-zoned.txt b/Documentation/device-mapper/dm-zoned.rst similarity index 97% rename from Documentation/device-mapper/dm-zoned.txt rename to Documentation/device-mapper/dm-zoned.rst index 736fcc78d193..07f56ebc1730 100644 --- a/Documentation/device-mapper/dm-zoned.txt +++ b/Documentation/device-mapper/dm-zoned.rst @@ -1,3 +1,4 @@ +======== dm-zoned ======== @@ -133,12 +134,13 @@ A zoned block device must first be formatted using the dmzadm tool. This will analyze the device zone configuration, determine where to place the metadata sets on the device and initialize the metadata sets. -Ex: +Ex:: -dmzadm --format /dev/sdxx + dmzadm --format /dev/sdxx For a formatted device, the target can be created normally with the dmsetup utility. The only parameter that dm-zoned requires is the -underlying zoned block device name. Ex: +underlying zoned block device name. Ex:: -echo "0 `blockdev --getsize ${dev}` zoned ${dev}" | dmsetup create dmz-`basename ${dev}` + echo "0 `blockdev --getsize ${dev}` zoned ${dev}" | \ + dmsetup create dmz-`basename ${dev}` diff --git a/Documentation/device-mapper/era.txt b/Documentation/device-mapper/era.rst similarity index 70% rename from Documentation/device-mapper/era.txt rename to Documentation/device-mapper/era.rst index 3c6d01be3560..90dd5c670b9f 100644 --- a/Documentation/device-mapper/era.txt +++ b/Documentation/device-mapper/era.rst @@ -1,3 +1,7 @@ +====== +dm-era +====== + Introduction ============ @@ -14,12 +18,14 @@ coherency after rolling back a vendor snapshot. Constructor =========== - era +era - metadata dev : fast device holding the persistent metadata - origin dev : device holding data blocks that may change - block size : block size of origin data device, granularity that is - tracked by the target + ================ ====================================================== + metadata dev fast device holding the persistent metadata + origin dev device holding data blocks that may change + block size block size of origin data device, granularity that is + tracked by the target + ================ ====================================================== Messages ======== @@ -49,14 +55,16 @@ Status <#used metadata blocks>/<#total metadata blocks> -metadata block size : Fixed block size for each metadata block in - sectors -#used metadata blocks : Number of metadata blocks used -#total metadata blocks : Total number of metadata blocks -current era : The current era -held metadata root : The location, in blocks, of the metadata root - that has been 'held' for userspace read - access. '-' indicates there is no held root +========================= ============================================== +metadata block size Fixed block size for each metadata block in + sectors +#used metadata blocks Number of metadata blocks used +#total metadata blocks Total number of metadata blocks +current era The current era +held metadata root The location, in blocks, of the metadata root + that has been 'held' for userspace read + access. '-' indicates there is no held root +========================= ============================================== Detailed use case ================= @@ -88,7 +96,7 @@ Memory usage The target uses a bitset to record writes in the current era. It also has a spare bitset ready for switching over to a new era. Other than -that it uses a few 4k blocks for updating metadata. +that it uses a few 4k blocks for updating metadata:: (4 * nr_blocks) bytes + buffers diff --git a/Documentation/device-mapper/index.rst b/Documentation/device-mapper/index.rst new file mode 100644 index 000000000000..105e253bc231 --- /dev/null +++ b/Documentation/device-mapper/index.rst @@ -0,0 +1,44 @@ +:orphan: + +============= +Device Mapper +============= + +.. toctree:: + :maxdepth: 1 + + cache-policies + cache + delay + dm-crypt + dm-flakey + dm-init + dm-integrity + dm-io + dm-log + dm-queue-length + dm-raid + dm-service-time + dm-uevent + dm-zoned + era + kcopyd + linear + log-writes + persistent-data + snapshot + statistics + striped + switch + thin-provisioning + unstriped + verity + writecache + zero + +.. only:: subproject and html + + Indices + ======= + + * :ref:`genindex` diff --git a/Documentation/device-mapper/kcopyd.txt b/Documentation/device-mapper/kcopyd.rst similarity index 93% rename from Documentation/device-mapper/kcopyd.txt rename to Documentation/device-mapper/kcopyd.rst index 820382c4cecf..7651d395127f 100644 --- a/Documentation/device-mapper/kcopyd.txt +++ b/Documentation/device-mapper/kcopyd.rst @@ -1,3 +1,4 @@ +====== kcopyd ====== @@ -7,7 +8,7 @@ notification. It is used by dm-snapshot and dm-mirror. Users of kcopyd must first create a client and indicate how many memory pages to set aside for their copy jobs. This is done with a call to -kcopyd_client_create(). +kcopyd_client_create():: int kcopyd_client_create(unsigned int num_pages, struct kcopyd_client **result); @@ -16,7 +17,7 @@ To start a copy job, the user must set up io_region structures to describe the source and destinations of the copy. Each io_region indicates a block-device along with the starting sector and size of the region. The source of the copy is given as one io_region structure, and the destinations of the -copy are given as an array of io_region structures. +copy are given as an array of io_region structures:: struct io_region { struct block_device *bdev; @@ -26,7 +27,7 @@ copy are given as an array of io_region structures. To start the copy, the user calls kcopyd_copy(), passing in the client pointer, pointers to the source and destination io_regions, the name of a -completion callback routine, and a pointer to some context data for the copy. +completion callback routine, and a pointer to some context data for the copy:: int kcopyd_copy(struct kcopyd_client *kc, struct io_region *from, unsigned int num_dests, struct io_region *dests, @@ -41,7 +42,6 @@ write error occurred during the copy. When a user is done with all their copy jobs, they should call kcopyd_client_destroy() to delete the kcopyd client, which will release the -associated memory pages. +associated memory pages:: void kcopyd_client_destroy(struct kcopyd_client *kc); - diff --git a/Documentation/device-mapper/linear.rst b/Documentation/device-mapper/linear.rst new file mode 100644 index 000000000000..9d17fc6e64a9 --- /dev/null +++ b/Documentation/device-mapper/linear.rst @@ -0,0 +1,63 @@ +========= +dm-linear +========= + +Device-Mapper's "linear" target maps a linear range of the Device-Mapper +device onto a linear range of another device. This is the basic building +block of logical volume managers. + +Parameters: + : + Full pathname to the underlying block-device, or a + "major:minor" device-number. + : + Starting sector within the device. + + +Example scripts +=============== + +:: + + #!/bin/sh + # Create an identity mapping for a device + echo "0 `blockdev --getsz $1` linear $1 0" | dmsetup create identity + +:: + + #!/bin/sh + # Join 2 devices together + size1=`blockdev --getsz $1` + size2=`blockdev --getsz $2` + echo "0 $size1 linear $1 0 + $size1 $size2 linear $2 0" | dmsetup create joined + +:: + + #!/usr/bin/perl -w + # Split a device into 4M chunks and then join them together in reverse order. + + my $name = "reverse"; + my $extent_size = 4 * 1024 * 2; + my $dev = $ARGV[0]; + my $table = ""; + my $count = 0; + + if (!defined($dev)) { + die("Please specify a device.\n"); + } + + my $dev_size = `blockdev --getsz $dev`; + my $extents = int($dev_size / $extent_size) - + (($dev_size % $extent_size) ? 1 : 0); + + while ($extents > 0) { + my $this_start = $count * $extent_size; + $extents--; + $count++; + my $this_offset = $extents * $extent_size; + + $table .= "$this_start $extent_size linear $dev $this_offset\n"; + } + + `echo \"$table\" | dmsetup create $name`; diff --git a/Documentation/device-mapper/linear.txt b/Documentation/device-mapper/linear.txt deleted file mode 100644 index 7cb98d89d3f8..000000000000 --- a/Documentation/device-mapper/linear.txt +++ /dev/null @@ -1,61 +0,0 @@ -dm-linear -========= - -Device-Mapper's "linear" target maps a linear range of the Device-Mapper -device onto a linear range of another device. This is the basic building -block of logical volume managers. - -Parameters: - : Full pathname to the underlying block-device, or a - "major:minor" device-number. - : Starting sector within the device. - - -Example scripts -=============== -[[ -#!/bin/sh -# Create an identity mapping for a device -echo "0 `blockdev --getsz $1` linear $1 0" | dmsetup create identity -]] - - -[[ -#!/bin/sh -# Join 2 devices together -size1=`blockdev --getsz $1` -size2=`blockdev --getsz $2` -echo "0 $size1 linear $1 0 -$size1 $size2 linear $2 0" | dmsetup create joined -]] - - -[[ -#!/usr/bin/perl -w -# Split a device into 4M chunks and then join them together in reverse order. - -my $name = "reverse"; -my $extent_size = 4 * 1024 * 2; -my $dev = $ARGV[0]; -my $table = ""; -my $count = 0; - -if (!defined($dev)) { - die("Please specify a device.\n"); -} - -my $dev_size = `blockdev --getsz $dev`; -my $extents = int($dev_size / $extent_size) - - (($dev_size % $extent_size) ? 1 : 0); - -while ($extents > 0) { - my $this_start = $count * $extent_size; - $extents--; - $count++; - my $this_offset = $extents * $extent_size; - - $table .= "$this_start $extent_size linear $dev $this_offset\n"; -} - -`echo \"$table\" | dmsetup create $name`; -]] diff --git a/Documentation/device-mapper/log-writes.txt b/Documentation/device-mapper/log-writes.rst similarity index 61% rename from Documentation/device-mapper/log-writes.txt rename to Documentation/device-mapper/log-writes.rst index b638d124be6a..23141f2ffb7c 100644 --- a/Documentation/device-mapper/log-writes.txt +++ b/Documentation/device-mapper/log-writes.rst @@ -1,3 +1,4 @@ +============= dm-log-writes ============= @@ -25,11 +26,11 @@ completed WRITEs, at the time the REQ_PREFLUSH is issued, are added in order to simulate the worst case scenario with regard to power failures. Consider the following example (W means write, C means complete): -W1,W2,W3,C3,C2,Wflush,C1,Cflush + W1,W2,W3,C3,C2,Wflush,C1,Cflush -The log would show the following +The log would show the following: -W3,W2,flush,W1.... + W3,W2,flush,W1.... Again this is to simulate what is actually on disk, this allows us to detect cases where a power failure at a particular point in time would create an @@ -42,11 +43,11 @@ Any REQ_OP_DISCARD requests are treated like WRITE requests. Otherwise we would have all the DISCARD requests, and then the WRITE requests and then the FLUSH request. Consider the following example: -WRITE block 1, DISCARD block 1, FLUSH + WRITE block 1, DISCARD block 1, FLUSH -If we logged DISCARD when it completed, the replay would look like this +If we logged DISCARD when it completed, the replay would look like this: -DISCARD 1, WRITE 1, FLUSH + DISCARD 1, WRITE 1, FLUSH which isn't quite what happened and wouldn't be caught during the log replay. @@ -57,15 +58,19 @@ i) Constructor log-writes - dev_path : Device that all of the IO will go to normally. - log_dev_path : Device where the log entries are written to. + ============= ============================================== + dev_path Device that all of the IO will go to normally. + log_dev_path Device where the log entries are written to. + ============= ============================================== ii) Status <#logged entries> - #logged entries : Number of logged entries - highest allocated sector : Highest allocated sector + =========================== ======================== + #logged entries Number of logged entries + highest allocated sector Highest allocated sector + =========================== ======================== iii) Messages @@ -75,15 +80,15 @@ iii) Messages For example say you want to fsck a file system after every write, but first you need to replay up to the mkfs to make sure we're fsck'ing something reasonable, you would do something like - this: + this:: mkfs.btrfs -f /dev/mapper/log dmsetup message log 0 mark mkfs - This would allow you to replay the log up to the mkfs mark and - then replay from that point on doing the fsck check in the - interval that you want. + This would allow you to replay the log up to the mkfs mark and + then replay from that point on doing the fsck check in the + interval that you want. Every log has a mark at the end labeled "dm-log-writes-end". @@ -97,42 +102,42 @@ Example usage ============= Say you want to test fsync on your file system. You would do something like -this: +this:: -TABLE="0 $(blockdev --getsz /dev/sdb) log-writes /dev/sdb /dev/sdc" -dmsetup create log --table "$TABLE" -mkfs.btrfs -f /dev/mapper/log -dmsetup message log 0 mark mkfs + TABLE="0 $(blockdev --getsz /dev/sdb) log-writes /dev/sdb /dev/sdc" + dmsetup create log --table "$TABLE" + mkfs.btrfs -f /dev/mapper/log + dmsetup message log 0 mark mkfs -mount /dev/mapper/log /mnt/btrfs-test - -dmsetup message log 0 mark fsync -md5sum /mnt/btrfs-test/foo -umount /mnt/btrfs-test + mount /dev/mapper/log /mnt/btrfs-test + + dmsetup message log 0 mark fsync + md5sum /mnt/btrfs-test/foo + umount /mnt/btrfs-test -dmsetup remove log -replay-log --log /dev/sdc --replay /dev/sdb --end-mark fsync -mount /dev/sdb /mnt/btrfs-test -md5sum /mnt/btrfs-test/foo - + dmsetup remove log + replay-log --log /dev/sdc --replay /dev/sdb --end-mark fsync + mount /dev/sdb /mnt/btrfs-test + md5sum /mnt/btrfs-test/foo + -Another option is to do a complicated file system operation and verify the file -system is consistent during the entire operation. You could do this with: + Another option is to do a complicated file system operation and verify the file + system is consistent during the entire operation. You could do this with: -TABLE="0 $(blockdev --getsz /dev/sdb) log-writes /dev/sdb /dev/sdc" -dmsetup create log --table "$TABLE" -mkfs.btrfs -f /dev/mapper/log -dmsetup message log 0 mark mkfs + TABLE="0 $(blockdev --getsz /dev/sdb) log-writes /dev/sdb /dev/sdc" + dmsetup create log --table "$TABLE" + mkfs.btrfs -f /dev/mapper/log + dmsetup message log 0 mark mkfs -mount /dev/mapper/log /mnt/btrfs-test - -btrfs filesystem balance /mnt/btrfs-test -umount /mnt/btrfs-test -dmsetup remove log + mount /dev/mapper/log /mnt/btrfs-test + + btrfs filesystem balance /mnt/btrfs-test + umount /mnt/btrfs-test + dmsetup remove log -replay-log --log /dev/sdc --replay /dev/sdb --end-mark mkfs -btrfsck /dev/sdb -replay-log --log /dev/sdc --replay /dev/sdb --start-mark mkfs \ + replay-log --log /dev/sdc --replay /dev/sdb --end-mark mkfs + btrfsck /dev/sdb + replay-log --log /dev/sdc --replay /dev/sdb --start-mark mkfs \ --fsck "btrfsck /dev/sdb" --check fua And that will replay the log until it sees a FUA request, run the fsck command diff --git a/Documentation/device-mapper/persistent-data.txt b/Documentation/device-mapper/persistent-data.rst similarity index 98% rename from Documentation/device-mapper/persistent-data.txt rename to Documentation/device-mapper/persistent-data.rst index a333bcb3a6c2..2065c3c5a091 100644 --- a/Documentation/device-mapper/persistent-data.txt +++ b/Documentation/device-mapper/persistent-data.rst @@ -1,3 +1,7 @@ +=============== +Persistent data +=============== + Introduction ============ diff --git a/Documentation/device-mapper/snapshot.txt b/Documentation/device-mapper/snapshot.rst similarity index 62% rename from Documentation/device-mapper/snapshot.txt rename to Documentation/device-mapper/snapshot.rst index b8bbb516f989..4c53304e72f1 100644 --- a/Documentation/device-mapper/snapshot.txt +++ b/Documentation/device-mapper/snapshot.rst @@ -1,15 +1,16 @@ +============================== Device-mapper snapshot support ============================== Device-mapper allows you, without massive data copying: -*) To create snapshots of any block device i.e. mountable, saved states of -the block device which are also writable without interfering with the -original content; -*) To create device "forks", i.e. multiple different versions of the -same data stream. -*) To merge a snapshot of a block device back into the snapshot's origin -device. +- To create snapshots of any block device i.e. mountable, saved states of + the block device which are also writable without interfering with the + original content; +- To create device "forks", i.e. multiple different versions of the + same data stream. +- To merge a snapshot of a block device back into the snapshot's origin + device. In the first two cases, dm copies only the chunks of data that get changed and uses a separate copy-on-write (COW) block device for @@ -22,7 +23,7 @@ the origin device. There are three dm targets available: snapshot, snapshot-origin, and snapshot-merge. -*) snapshot-origin +- snapshot-origin which will normally have one or more snapshots based on it. Reads will be mapped directly to the backing device. For each write, the @@ -30,7 +31,7 @@ original data will be saved in the of each snapshot to keep its visible content unchanged, at least until the fills up. -*) snapshot +- snapshot A snapshot of the block device is created. Changed chunks of sectors will be stored on the . Writes will @@ -83,25 +84,25 @@ When you create the first LVM2 snapshot of a volume, four dm devices are used: source volume), whose table is replaced by a "snapshot-origin" mapping from device #1. -A fixed naming scheme is used, so with the following commands: +A fixed naming scheme is used, so with the following commands:: -lvcreate -L 1G -n base volumeGroup -lvcreate -L 100M --snapshot -n snap volumeGroup/base + lvcreate -L 1G -n base volumeGroup + lvcreate -L 100M --snapshot -n snap volumeGroup/base -we'll have this situation (with volumes in above order): +we'll have this situation (with volumes in above order):: -# dmsetup table|grep volumeGroup + # dmsetup table|grep volumeGroup -volumeGroup-base-real: 0 2097152 linear 8:19 384 -volumeGroup-snap-cow: 0 204800 linear 8:19 2097536 -volumeGroup-snap: 0 2097152 snapshot 254:11 254:12 P 16 -volumeGroup-base: 0 2097152 snapshot-origin 254:11 + volumeGroup-base-real: 0 2097152 linear 8:19 384 + volumeGroup-snap-cow: 0 204800 linear 8:19 2097536 + volumeGroup-snap: 0 2097152 snapshot 254:11 254:12 P 16 + volumeGroup-base: 0 2097152 snapshot-origin 254:11 -# ls -lL /dev/mapper/volumeGroup-* -brw------- 1 root root 254, 11 29 ago 18:15 /dev/mapper/volumeGroup-base-real -brw------- 1 root root 254, 12 29 ago 18:15 /dev/mapper/volumeGroup-snap-cow -brw------- 1 root root 254, 13 29 ago 18:15 /dev/mapper/volumeGroup-snap -brw------- 1 root root 254, 10 29 ago 18:14 /dev/mapper/volumeGroup-base + # ls -lL /dev/mapper/volumeGroup-* + brw------- 1 root root 254, 11 29 ago 18:15 /dev/mapper/volumeGroup-base-real + brw------- 1 root root 254, 12 29 ago 18:15 /dev/mapper/volumeGroup-snap-cow + brw------- 1 root root 254, 13 29 ago 18:15 /dev/mapper/volumeGroup-snap + brw------- 1 root root 254, 10 29 ago 18:14 /dev/mapper/volumeGroup-base How snapshot-merge is used by LVM2 @@ -114,27 +115,28 @@ merging snapshot after it completes. The "snapshot" that hands over its COW device to the "snapshot-merge" is deactivated (unless using lvchange --refresh); but if it is left active it will simply return I/O errors. -A snapshot will merge into its origin with the following command: +A snapshot will merge into its origin with the following command:: -lvconvert --merge volumeGroup/snap + lvconvert --merge volumeGroup/snap -we'll now have this situation: +we'll now have this situation:: -# dmsetup table|grep volumeGroup + # dmsetup table|grep volumeGroup -volumeGroup-base-real: 0 2097152 linear 8:19 384 -volumeGroup-base-cow: 0 204800 linear 8:19 2097536 -volumeGroup-base: 0 2097152 snapshot-merge 254:11 254:12 P 16 + volumeGroup-base-real: 0 2097152 linear 8:19 384 + volumeGroup-base-cow: 0 204800 linear 8:19 2097536 + volumeGroup-base: 0 2097152 snapshot-merge 254:11 254:12 P 16 -# ls -lL /dev/mapper/volumeGroup-* -brw------- 1 root root 254, 11 29 ago 18:15 /dev/mapper/volumeGroup-base-real -brw------- 1 root root 254, 12 29 ago 18:16 /dev/mapper/volumeGroup-base-cow -brw------- 1 root root 254, 10 29 ago 18:16 /dev/mapper/volumeGroup-base + # ls -lL /dev/mapper/volumeGroup-* + brw------- 1 root root 254, 11 29 ago 18:15 /dev/mapper/volumeGroup-base-real + brw------- 1 root root 254, 12 29 ago 18:16 /dev/mapper/volumeGroup-base-cow + brw------- 1 root root 254, 10 29 ago 18:16 /dev/mapper/volumeGroup-base How to determine when a merging is complete =========================================== The snapshot-merge and snapshot status lines end with: + / Both and include both data and metadata. @@ -142,35 +144,37 @@ During merging, the number of sectors allocated gets smaller and smaller. Merging has finished when the number of sectors holding data is zero, in other words == . -Here is a practical example (using a hybrid of lvm and dmsetup commands): +Here is a practical example (using a hybrid of lvm and dmsetup commands):: -# lvs - LV VG Attr LSize Origin Snap% Move Log Copy% Convert - base volumeGroup owi-a- 4.00g - snap volumeGroup swi-a- 1.00g base 18.97 + # lvs + LV VG Attr LSize Origin Snap% Move Log Copy% Convert + base volumeGroup owi-a- 4.00g + snap volumeGroup swi-a- 1.00g base 18.97 -# dmsetup status volumeGroup-snap -0 8388608 snapshot 397896/2097152 1560 - ^^^^ metadata sectors + # dmsetup status volumeGroup-snap + 0 8388608 snapshot 397896/2097152 1560 + ^^^^ metadata sectors -# lvconvert --merge -b volumeGroup/snap - Merging of volume snap started. + # lvconvert --merge -b volumeGroup/snap + Merging of volume snap started. -# lvs volumeGroup/snap - LV VG Attr LSize Origin Snap% Move Log Copy% Convert - base volumeGroup Owi-a- 4.00g 17.23 + # lvs volumeGroup/snap + LV VG Attr LSize Origin Snap% Move Log Copy% Convert + base volumeGroup Owi-a- 4.00g 17.23 -# dmsetup status volumeGroup-base -0 8388608 snapshot-merge 281688/2097152 1104 + # dmsetup status volumeGroup-base + 0 8388608 snapshot-merge 281688/2097152 1104 -# dmsetup status volumeGroup-base -0 8388608 snapshot-merge 180480/2097152 712 + # dmsetup status volumeGroup-base + 0 8388608 snapshot-merge 180480/2097152 712 -# dmsetup status volumeGroup-base -0 8388608 snapshot-merge 16/2097152 16 + # dmsetup status volumeGroup-base + 0 8388608 snapshot-merge 16/2097152 16 Merging has finished. -# lvs - LV VG Attr LSize Origin Snap% Move Log Copy% Convert - base volumeGroup owi-a- 4.00g +:: + + # lvs + LV VG Attr LSize Origin Snap% Move Log Copy% Convert + base volumeGroup owi-a- 4.00g diff --git a/Documentation/device-mapper/statistics.txt b/Documentation/device-mapper/statistics.rst similarity index 87% rename from Documentation/device-mapper/statistics.txt rename to Documentation/device-mapper/statistics.rst index 170ac02a1f50..3d80a9f850cc 100644 --- a/Documentation/device-mapper/statistics.txt +++ b/Documentation/device-mapper/statistics.rst @@ -1,3 +1,4 @@ +============= DM statistics ============= @@ -11,7 +12,7 @@ Individual statistics will be collected for each step-sized area within the range specified. The I/O statistics counters for each step-sized area of a region are -in the same format as /sys/block/*/stat or /proc/diskstats (see: +in the same format as `/sys/block/*/stat` or `/proc/diskstats` (see: Documentation/iostats.txt). But two extra counters (12 and 13) are provided: total time spent reading and writing. When the histogram argument is used, the 14th parameter is reported that represents the @@ -32,40 +33,45 @@ on each other's data. The creation of DM statistics will allocate memory via kmalloc or fallback to using vmalloc space. At most, 1/4 of the overall system memory may be allocated by DM statistics. The admin can see how much -memory is used by reading -/sys/module/dm_mod/parameters/stats_current_allocated_bytes +memory is used by reading: + + /sys/module/dm_mod/parameters/stats_current_allocated_bytes Messages ======== - @stats_create - [ ...] - [ []] - + @stats_create [ ...] [ []] Create a new region and return the region_id. - "-" - whole device - "+" - a range of 512-byte sectors - starting with . + "-" + whole device + "+" + a range of 512-byte sectors + starting with . - "" - the range is subdivided into areas each containing - sectors. - "/" - the range is subdivided into the specified - number of areas. + "" + the range is subdivided into areas each containing + sectors. + "/" + the range is subdivided into the specified + number of areas. The number of optional arguments - The following optional arguments are supported - precise_timestamps - use precise timer with nanosecond resolution + The following optional arguments are supported: + + precise_timestamps + use precise timer with nanosecond resolution instead of the "jiffies" variable. When this argument is used, the resulting times are in nanoseconds instead of milliseconds. Precise timestamps are a little bit slower to obtain than jiffies-based timestamps. - histogram:n1,n2,n3,n4,... - collect histogram of latencies. The + histogram:n1,n2,n3,n4,... + collect histogram of latencies. The numbers n1, n2, etc are times that represent the boundaries of the histogram. If precise_timestamps is not used, the times are in milliseconds, otherwise they are in @@ -96,21 +102,18 @@ Messages @stats_list message, but it doesn't use this value for anything. @stats_delete - Delete the region with the specified id. region_id returned from @stats_create @stats_clear - Clear all the counters except the in-flight i/o counters. region_id returned from @stats_create @stats_list [] - List all regions registered with @stats_create. @@ -127,7 +130,6 @@ Messages if they were specified when creating the region. @stats_print [ ] - Print counters for each step-sized area of a region. @@ -143,10 +145,11 @@ Messages Output format for each step-sized area of a region: - + counters + + + counters The first 11 counters have the same meaning as - /sys/block/*/stat or /proc/diskstats. + `/sys/block/*/stat or /proc/diskstats`. Please refer to Documentation/iostats.txt for details. @@ -163,11 +166,11 @@ Messages 11. the weighted number of milliseconds spent doing I/Os Additional counters: + 12. the total time spent reading in milliseconds 13. the total time spent writing in milliseconds @stats_print_clear [ ] - Atomically print and then clear all the counters except the in-flight i/o counters. Useful when the client consuming the statistics does not want to lose any statistics (those updated @@ -185,7 +188,6 @@ Messages If omitted, all lines are printed and then cleared. @stats_set_aux - Store auxiliary data aux_data for the specified region. @@ -201,23 +203,23 @@ Examples ======== Subdivide the DM device 'vol' into 100 pieces and start collecting -statistics on them: +statistics on them:: dmsetup message vol 0 @stats_create - /100 Set the auxiliary data string to "foo bar baz" (the escape for each -space must also be escaped, otherwise the shell will consume them): +space must also be escaped, otherwise the shell will consume them):: dmsetup message vol 0 @stats_set_aux 0 foo\\ bar\\ baz -List the statistics: +List the statistics:: dmsetup message vol 0 @stats_list -Print the statistics: +Print the statistics:: dmsetup message vol 0 @stats_print 0 -Delete the statistics: +Delete the statistics:: dmsetup message vol 0 @stats_delete 0 diff --git a/Documentation/device-mapper/striped.rst b/Documentation/device-mapper/striped.rst new file mode 100644 index 000000000000..e9a8da192ae1 --- /dev/null +++ b/Documentation/device-mapper/striped.rst @@ -0,0 +1,61 @@ +========= +dm-stripe +========= + +Device-Mapper's "striped" target is used to create a striped (i.e. RAID-0) +device across one or more underlying devices. Data is written in "chunks", +with consecutive chunks rotating among the underlying devices. This can +potentially provide improved I/O throughput by utilizing several physical +devices in parallel. + +Parameters: [ ]+ + : + Number of underlying devices. + : + Size of each chunk of data. Must be at least as + large as the system's PAGE_SIZE. + : + Full pathname to the underlying block-device, or a + "major:minor" device-number. + : + Starting sector within the device. + +One or more underlying devices can be specified. The striped device size must +be a multiple of the chunk size multiplied by the number of underlying devices. + + +Example scripts +=============== + +:: + + #!/usr/bin/perl -w + # Create a striped device across any number of underlying devices. The device + # will be called "stripe_dev" and have a chunk-size of 128k. + + my $chunk_size = 128 * 2; + my $dev_name = "stripe_dev"; + my $num_devs = @ARGV; + my @devs = @ARGV; + my ($min_dev_size, $stripe_dev_size, $i); + + if (!$num_devs) { + die("Specify at least one device\n"); + } + + $min_dev_size = `blockdev --getsz $devs[0]`; + for ($i = 1; $i < $num_devs; $i++) { + my $this_size = `blockdev --getsz $devs[$i]`; + $min_dev_size = ($min_dev_size < $this_size) ? + $min_dev_size : $this_size; + } + + $stripe_dev_size = $min_dev_size * $num_devs; + $stripe_dev_size -= $stripe_dev_size % ($chunk_size * $num_devs); + + $table = "0 $stripe_dev_size striped $num_devs $chunk_size"; + for ($i = 0; $i < $num_devs; $i++) { + $table .= " $devs[$i] 0"; + } + + `echo $table | dmsetup create $dev_name`; diff --git a/Documentation/device-mapper/striped.txt b/Documentation/device-mapper/striped.txt deleted file mode 100644 index 07ec492cceee..000000000000 --- a/Documentation/device-mapper/striped.txt +++ /dev/null @@ -1,57 +0,0 @@ -dm-stripe -========= - -Device-Mapper's "striped" target is used to create a striped (i.e. RAID-0) -device across one or more underlying devices. Data is written in "chunks", -with consecutive chunks rotating among the underlying devices. This can -potentially provide improved I/O throughput by utilizing several physical -devices in parallel. - -Parameters: [ ]+ - : Number of underlying devices. - : Size of each chunk of data. Must be at least as - large as the system's PAGE_SIZE. - : Full pathname to the underlying block-device, or a - "major:minor" device-number. - : Starting sector within the device. - -One or more underlying devices can be specified. The striped device size must -be a multiple of the chunk size multiplied by the number of underlying devices. - - -Example scripts -=============== - -[[ -#!/usr/bin/perl -w -# Create a striped device across any number of underlying devices. The device -# will be called "stripe_dev" and have a chunk-size of 128k. - -my $chunk_size = 128 * 2; -my $dev_name = "stripe_dev"; -my $num_devs = @ARGV; -my @devs = @ARGV; -my ($min_dev_size, $stripe_dev_size, $i); - -if (!$num_devs) { - die("Specify at least one device\n"); -} - -$min_dev_size = `blockdev --getsz $devs[0]`; -for ($i = 1; $i < $num_devs; $i++) { - my $this_size = `blockdev --getsz $devs[$i]`; - $min_dev_size = ($min_dev_size < $this_size) ? - $min_dev_size : $this_size; -} - -$stripe_dev_size = $min_dev_size * $num_devs; -$stripe_dev_size -= $stripe_dev_size % ($chunk_size * $num_devs); - -$table = "0 $stripe_dev_size striped $num_devs $chunk_size"; -for ($i = 0; $i < $num_devs; $i++) { - $table .= " $devs[$i] 0"; -} - -`echo $table | dmsetup create $dev_name`; -]] - diff --git a/Documentation/device-mapper/switch.txt b/Documentation/device-mapper/switch.rst similarity index 84% rename from Documentation/device-mapper/switch.txt rename to Documentation/device-mapper/switch.rst index 5bd4831db4a8..7dde06be1a4f 100644 --- a/Documentation/device-mapper/switch.txt +++ b/Documentation/device-mapper/switch.rst @@ -1,3 +1,4 @@ +========= dm-switch ========= @@ -67,27 +68,25 @@ b-tree can achieve. Construction Parameters ======================= - [...] - [ ]+ + [...] [ ]+ + + The number of paths across which to distribute the I/O. - - The number of paths across which to distribute the I/O. + + The number of 512-byte sectors in a region. Each region can be redirected + to any of the available paths. - - The number of 512-byte sectors in a region. Each region can be redirected - to any of the available paths. + + The number of optional arguments. Currently, no optional arguments + are supported and so this must be zero. - - The number of optional arguments. Currently, no optional arguments - are supported and so this must be zero. + + The block device that represents a specific path to the device. - - The block device that represents a specific path to the device. - - - The offset of the start of data on the specific (in units - of 512-byte sectors). This number is added to the sector number when - forwarding the request to the specific path. Typically it is zero. + + The offset of the start of data on the specific (in units + of 512-byte sectors). This number is added to the sector number when + forwarding the request to the specific path. Typically it is zero. Messages ======== @@ -122,17 +121,21 @@ Example Assume that you have volumes vg1/switch0 vg1/switch1 vg1/switch2 with the same size. -Create a switch device with 64kB region size: +Create a switch device with 64kB region size:: + dmsetup create switch --table "0 `blockdev --getsz /dev/vg1/switch0` switch 3 128 0 /dev/vg1/switch0 0 /dev/vg1/switch1 0 /dev/vg1/switch2 0" Set mappings for the first 7 entries to point to devices switch0, switch1, -switch2, switch0, switch1, switch2, switch1: +switch2, switch0, switch1, switch2, switch1:: + dmsetup message switch 0 set_region_mappings 0:0 :1 :2 :0 :1 :2 :1 -Set repetitive mapping. This command: +Set repetitive mapping. This command:: + dmsetup message switch 0 set_region_mappings 1000:1 :2 R2,10 -is equivalent to: + +is equivalent to:: + dmsetup message switch 0 set_region_mappings 1000:1 :2 :1 :2 :1 :2 :1 :2 \ :1 :2 :1 :2 :1 :2 :1 :2 :1 :2 - diff --git a/Documentation/device-mapper/thin-provisioning.txt b/Documentation/device-mapper/thin-provisioning.rst similarity index 92% rename from Documentation/device-mapper/thin-provisioning.txt rename to Documentation/device-mapper/thin-provisioning.rst index 883e7ca5f745..bafebf79da4b 100644 --- a/Documentation/device-mapper/thin-provisioning.txt +++ b/Documentation/device-mapper/thin-provisioning.rst @@ -1,3 +1,7 @@ +================= +Thin provisioning +================= + Introduction ============ @@ -95,6 +99,8 @@ previously.) Using an existing pool device ----------------------------- +:: + dmsetup create pool \ --table "0 20971520 thin-pool $metadata_dev $data_dev \ $data_block_size $low_water_mark" @@ -154,7 +160,7 @@ Thin provisioning i) Creating a new thinly-provisioned volume. To create a new thinly- provisioned volume you must send a message to an - active pool device, /dev/mapper/pool in this example. + active pool device, /dev/mapper/pool in this example:: dmsetup message /dev/mapper/pool 0 "create_thin 0" @@ -164,7 +170,7 @@ i) Creating a new thinly-provisioned volume. ii) Using a thinly-provisioned volume. - Thinly-provisioned volumes are activated using the 'thin' target: + Thinly-provisioned volumes are activated using the 'thin' target:: dmsetup create thin --table "0 2097152 thin /dev/mapper/pool 0" @@ -181,6 +187,8 @@ i) Creating an internal snapshot. must suspend it before creating the snapshot to avoid corruption. This is NOT enforced at the moment, so please be careful! + :: + dmsetup suspend /dev/mapper/thin dmsetup message /dev/mapper/pool 0 "create_snap 1 0" dmsetup resume /dev/mapper/thin @@ -198,14 +206,14 @@ ii) Using an internal snapshot. activating or removing them both. (This differs from conventional device-mapper snapshots.) - Activate it exactly the same way as any other thinly-provisioned volume: + Activate it exactly the same way as any other thinly-provisioned volume:: dmsetup create snap --table "0 2097152 thin /dev/mapper/pool 1" External snapshots ------------------ -You can use an external _read only_ device as an origin for a +You can use an external **read only** device as an origin for a thinly-provisioned volume. Any read to an unprovisioned area of the thin device will be passed through to the origin. Writes trigger the allocation of new blocks as usual. @@ -223,11 +231,13 @@ i) Creating a snapshot of an external device This is the same as creating a thin device. You don't mention the origin at this stage. + :: + dmsetup message /dev/mapper/pool 0 "create_thin 0" ii) Using a snapshot of an external device. - Append an extra parameter to the thin target specifying the origin: + Append an extra parameter to the thin target specifying the origin:: dmsetup create snap --table "0 2097152 thin /dev/mapper/pool 0 /dev/image" @@ -240,6 +250,8 @@ Deactivation All devices using a pool must be deactivated before the pool itself can be. +:: + dmsetup remove thin dmsetup remove snap dmsetup remove pool @@ -252,25 +264,32 @@ Reference i) Constructor - thin-pool \ - [ []*] + :: + + thin-pool \ + [ []*] Optional feature arguments: - skip_block_zeroing: Skip the zeroing of newly-provisioned blocks. + skip_block_zeroing: + Skip the zeroing of newly-provisioned blocks. - ignore_discard: Disable discard support. + ignore_discard: + Disable discard support. - no_discard_passdown: Don't pass discards down to the underlying - data device, but just remove the mapping. + no_discard_passdown: + Don't pass discards down to the underlying + data device, but just remove the mapping. - read_only: Don't allow any changes to be made to the pool + read_only: + Don't allow any changes to be made to the pool metadata. This mode is only available after the thin-pool has been created and first used in full read/write mode. It cannot be specified on initial thin-pool creation. - error_if_no_space: Error IOs, instead of queueing, if no space. + error_if_no_space: + Error IOs, instead of queueing, if no space. Data block size must be between 64KB (128 sectors) and 1GB (2097152 sectors) inclusive. @@ -278,10 +297,12 @@ i) Constructor ii) Status - / - / - ro|rw|out_of_data_space [no_]discard_passdown [error|queue]_if_no_space - needs_check|- metadata_low_watermark + :: + + / + / + ro|rw|out_of_data_space [no_]discard_passdown [error|queue]_if_no_space + needs_check|- metadata_low_watermark transaction id: A 64-bit number used by userspace to help synchronise with metadata @@ -336,13 +357,11 @@ ii) Status iii) Messages create_thin - Create a new thinly-provisioned device. is an arbitrary unique 24-bit identifier chosen by the caller. create_snap - Create a new snapshot of another thinly-provisioned device. is an arbitrary unique 24-bit identifier chosen by the caller. @@ -350,11 +369,9 @@ iii) Messages of which the new device will be a snapshot. delete - Deletes a thin device. Irreversible. set_transaction_id - Userland volume managers, such as LVM, need a way to synchronise their external metadata with the internal metadata of the pool target. The thin-pool target offers to store an @@ -364,14 +381,12 @@ iii) Messages compare-and-swap message. reserve_metadata_snap - Reserve a copy of the data mapping btree for use by userland. This allows userland to inspect the mappings as they were when this message was executed. Use the pool's status command to get the root block associated with the metadata snapshot. release_metadata_snap - Release a previously reserved copy of the data mapping btree. 'thin' target @@ -379,7 +394,9 @@ iii) Messages i) Constructor - thin [] + :: + + thin [] pool dev: the thin-pool device, e.g. /dev/mapper/my_pool or 253:0 @@ -401,8 +418,7 @@ provisioned as and when needed. ii) Status - - + If the pool has encountered device errors and failed, the status will just contain the string 'Fail'. The userspace recovery tools should then be used. diff --git a/Documentation/device-mapper/unstriped.txt b/Documentation/device-mapper/unstriped.rst similarity index 60% rename from Documentation/device-mapper/unstriped.txt rename to Documentation/device-mapper/unstriped.rst index 0b2a306c54ee..0a8d3eb3f072 100644 --- a/Documentation/device-mapper/unstriped.txt +++ b/Documentation/device-mapper/unstriped.rst @@ -1,3 +1,7 @@ +================================ +Device-mapper "unstriped" target +================================ + Introduction ============ @@ -34,46 +38,46 @@ striped target to combine the 4 devices into one. It then will use the unstriped target ontop of the striped device to access the individual backing loop devices. We write data to the newly exposed unstriped devices and verify the data written matches the correct -underlying device on the striped array. +underlying device on the striped array:: -#!/bin/bash + #!/bin/bash -MEMBER_SIZE=$((128 * 1024 * 1024)) -NUM=4 -SEQ_END=$((${NUM}-1)) -CHUNK=256 -BS=4096 + MEMBER_SIZE=$((128 * 1024 * 1024)) + NUM=4 + SEQ_END=$((${NUM}-1)) + CHUNK=256 + BS=4096 -RAID_SIZE=$((${MEMBER_SIZE}*${NUM}/512)) -DM_PARMS="0 ${RAID_SIZE} striped ${NUM} ${CHUNK}" -COUNT=$((${MEMBER_SIZE} / ${BS})) + RAID_SIZE=$((${MEMBER_SIZE}*${NUM}/512)) + DM_PARMS="0 ${RAID_SIZE} striped ${NUM} ${CHUNK}" + COUNT=$((${MEMBER_SIZE} / ${BS})) -for i in $(seq 0 ${SEQ_END}); do - dd if=/dev/zero of=member-${i} bs=${MEMBER_SIZE} count=1 oflag=direct - losetup /dev/loop${i} member-${i} - DM_PARMS+=" /dev/loop${i} 0" -done + for i in $(seq 0 ${SEQ_END}); do + dd if=/dev/zero of=member-${i} bs=${MEMBER_SIZE} count=1 oflag=direct + losetup /dev/loop${i} member-${i} + DM_PARMS+=" /dev/loop${i} 0" + done -echo $DM_PARMS | dmsetup create raid0 -for i in $(seq 0 ${SEQ_END}); do - echo "0 1 unstriped ${NUM} ${CHUNK} ${i} /dev/mapper/raid0 0" | dmsetup create set-${i} -done; + echo $DM_PARMS | dmsetup create raid0 + for i in $(seq 0 ${SEQ_END}); do + echo "0 1 unstriped ${NUM} ${CHUNK} ${i} /dev/mapper/raid0 0" | dmsetup create set-${i} + done; -for i in $(seq 0 ${SEQ_END}); do - dd if=/dev/urandom of=/dev/mapper/set-${i} bs=${BS} count=${COUNT} oflag=direct - diff /dev/mapper/set-${i} member-${i} -done; + for i in $(seq 0 ${SEQ_END}); do + dd if=/dev/urandom of=/dev/mapper/set-${i} bs=${BS} count=${COUNT} oflag=direct + diff /dev/mapper/set-${i} member-${i} + done; -for i in $(seq 0 ${SEQ_END}); do - dmsetup remove set-${i} -done + for i in $(seq 0 ${SEQ_END}); do + dmsetup remove set-${i} + done -dmsetup remove raid0 + dmsetup remove raid0 -for i in $(seq 0 ${SEQ_END}); do - losetup -d /dev/loop${i} - rm -f member-${i} -done + for i in $(seq 0 ${SEQ_END}); do + losetup -d /dev/loop${i} + rm -f member-${i} + done Another example --------------- @@ -81,7 +85,7 @@ Another example Intel NVMe drives contain two cores on the physical device. Each core of the drive has segregated access to its LBA range. The current LBA model has a RAID 0 128k chunk on each core, resulting -in a 256k stripe across the two cores: +in a 256k stripe across the two cores:: Core 0: Core 1: __________ __________ @@ -108,17 +112,24 @@ Example dmsetup usage unstriped ontop of Intel NVMe device that has 2 cores ----------------------------------------------------- -dmsetup create nvmset0 --table '0 512 unstriped 2 256 0 /dev/nvme0n1 0' -dmsetup create nvmset1 --table '0 512 unstriped 2 256 1 /dev/nvme0n1 0' + +:: + + dmsetup create nvmset0 --table '0 512 unstriped 2 256 0 /dev/nvme0n1 0' + dmsetup create nvmset1 --table '0 512 unstriped 2 256 1 /dev/nvme0n1 0' There will now be two devices that expose Intel NVMe core 0 and 1 -respectively: -/dev/mapper/nvmset0 -/dev/mapper/nvmset1 +respectively:: + + /dev/mapper/nvmset0 + /dev/mapper/nvmset1 unstriped ontop of striped with 4 drives using 128K chunk size -------------------------------------------------------------- -dmsetup create raid_disk0 --table '0 512 unstriped 4 256 0 /dev/mapper/striped 0' -dmsetup create raid_disk1 --table '0 512 unstriped 4 256 1 /dev/mapper/striped 0' -dmsetup create raid_disk2 --table '0 512 unstriped 4 256 2 /dev/mapper/striped 0' -dmsetup create raid_disk3 --table '0 512 unstriped 4 256 3 /dev/mapper/striped 0' + +:: + + dmsetup create raid_disk0 --table '0 512 unstriped 4 256 0 /dev/mapper/striped 0' + dmsetup create raid_disk1 --table '0 512 unstriped 4 256 1 /dev/mapper/striped 0' + dmsetup create raid_disk2 --table '0 512 unstriped 4 256 2 /dev/mapper/striped 0' + dmsetup create raid_disk3 --table '0 512 unstriped 4 256 3 /dev/mapper/striped 0' diff --git a/Documentation/device-mapper/verity.txt b/Documentation/device-mapper/verity.rst similarity index 98% rename from Documentation/device-mapper/verity.txt rename to Documentation/device-mapper/verity.rst index b3d2e4a42255..a4d1c1476d72 100644 --- a/Documentation/device-mapper/verity.txt +++ b/Documentation/device-mapper/verity.rst @@ -1,5 +1,6 @@ +========= dm-verity -========== +========= Device-Mapper's "verity" target provides transparent integrity checking of block devices using a cryptographic digest provided by the kernel crypto API. @@ -7,6 +8,9 @@ This target is read-only. Construction Parameters ======================= + +:: + @@ -160,7 +164,9 @@ calculating the parent node. The tree looks something like: -alg = sha256, num_blocks = 32768, block_size = 4096 + alg = sha256, num_blocks = 32768, block_size = 4096 + +:: [ root ] / . . . \ @@ -189,6 +195,7 @@ block boundary) are the hash blocks which are stored a depth at a time The full specification of kernel parameters and on-disk metadata format is available at the cryptsetup project's wiki page + https://gitlab.com/cryptsetup/cryptsetup/wikis/DMVerity Status @@ -198,7 +205,8 @@ If any check failed, C (for Corruption) is returned. Example ======= -Set up a device: +Set up a device:: + # dmsetup create vroot --readonly --table \ "0 2097152 verity 1 /dev/sda1 /dev/sda2 4096 4096 262144 1 sha256 "\ "4392712ba01368efdf14b05c76f9e4df0d53664630b5d48632ed17a137f39076 "\ @@ -209,11 +217,13 @@ the hash tree or activate the kernel device. This is available from the cryptsetup upstream repository https://gitlab.com/cryptsetup/cryptsetup/ (as a libcryptsetup extension). -Create hash on the device: +Create hash on the device:: + # veritysetup format /dev/sda1 /dev/sda2 ... Root hash: 4392712ba01368efdf14b05c76f9e4df0d53664630b5d48632ed17a137f39076 -Activate the device: +Activate the device:: + # veritysetup create vroot /dev/sda1 /dev/sda2 \ 4392712ba01368efdf14b05c76f9e4df0d53664630b5d48632ed17a137f39076 diff --git a/Documentation/device-mapper/writecache.txt b/Documentation/device-mapper/writecache.rst similarity index 96% rename from Documentation/device-mapper/writecache.txt rename to Documentation/device-mapper/writecache.rst index 01532b3008ae..d3d7690f5e8d 100644 --- a/Documentation/device-mapper/writecache.txt +++ b/Documentation/device-mapper/writecache.rst @@ -1,3 +1,7 @@ +================= +Writecache target +================= + The writecache target caches writes on persistent memory or on SSD. It doesn't cache reads because reads are supposed to be cached in page cache in normal RAM. @@ -6,15 +10,18 @@ When the device is constructed, the first sector should be zeroed or the first sector should contain valid superblock from previous invocation. Constructor parameters: + 1. type of the cache device - "p" or "s" - p - persistent memory - s - SSD + + - p - persistent memory + - s - SSD 2. the underlying device that will be cached 3. the cache device 4. block size (4096 is recommended; the maximum block size is the page size) 5. the number of optional parameters (the parameters with an argument count as two) + start_sector n (default: 0) offset from the start of cache device in 512-byte sectors high_watermark n (default: 50) @@ -43,6 +50,7 @@ Constructor parameters: applicable only to persistent memory - don't use the FUA flag when writing back data and send the FLUSH request afterwards + - some underlying devices perform better with fua, some with nofua. The user should test it @@ -60,6 +68,7 @@ Messages: flush the cache device on next suspend. Use this message when you are going to remove the cache device. The proper sequence for removing the cache device is: + 1. send the "flush_on_suspend" message 2. load an inactive table with a linear target that maps to the underlying device diff --git a/Documentation/device-mapper/zero.txt b/Documentation/device-mapper/zero.rst similarity index 83% rename from Documentation/device-mapper/zero.txt rename to Documentation/device-mapper/zero.rst index 20fb38e7fa7e..11fb5cf4597c 100644 --- a/Documentation/device-mapper/zero.txt +++ b/Documentation/device-mapper/zero.rst @@ -1,3 +1,4 @@ +======= dm-zero ======= @@ -18,20 +19,19 @@ filesystem limitations. To create a sparse device, start by creating a dm-zero device that's the desired size of the sparse device. For this example, we'll assume a 10TB -sparse device. +sparse device:: -TEN_TERABYTES=`expr 10 \* 1024 \* 1024 \* 1024 \* 2` # 10 TB in sectors -echo "0 $TEN_TERABYTES zero" | dmsetup create zero1 + TEN_TERABYTES=`expr 10 \* 1024 \* 1024 \* 1024 \* 2` # 10 TB in sectors + echo "0 $TEN_TERABYTES zero" | dmsetup create zero1 Then create a snapshot of the zero device, using any available block-device as the COW device. The size of the COW device will determine the amount of real space available to the sparse device. For this example, we'll assume /dev/sdb1 -is an available 10GB partition. +is an available 10GB partition:: -echo "0 $TEN_TERABYTES snapshot /dev/mapper/zero1 /dev/sdb1 p 128" | \ - dmsetup create sparse1 + echo "0 $TEN_TERABYTES snapshot /dev/mapper/zero1 /dev/sdb1 p 128" | \ + dmsetup create sparse1 This will create a 10TB sparse device called /dev/mapper/sparse1 that has 10GB of actual storage space available. If more than 10GB of data is written to this device, it will start returning I/O errors. - diff --git a/Documentation/filesystems/ubifs-authentication.md b/Documentation/filesystems/ubifs-authentication.md index 028b3e2e25f9..23e698167141 100644 --- a/Documentation/filesystems/ubifs-authentication.md +++ b/Documentation/filesystems/ubifs-authentication.md @@ -417,9 +417,9 @@ will then have to be provided beforehand in the normal way. [DMC-CBC-ATTACK] http://www.jakoblell.com/blog/2013/12/22/practical-malleability-attack-against-cbc-encrypted-luks-partitions/ -[DM-INTEGRITY] https://www.kernel.org/doc/Documentation/device-mapper/dm-integrity.txt +[DM-INTEGRITY] https://www.kernel.org/doc/Documentation/device-mapper/dm-integrity.rst -[DM-VERITY] https://www.kernel.org/doc/Documentation/device-mapper/verity.txt +[DM-VERITY] https://www.kernel.org/doc/Documentation/device-mapper/verity.rst [FSCRYPT-POLICY2] https://www.spinics.net/lists/linux-ext4/msg58710.html diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig index 45254b3ef715..5ccac0b77f17 100644 --- a/drivers/md/Kconfig +++ b/drivers/md/Kconfig @@ -453,7 +453,7 @@ config DM_INIT Enable "dm-mod.create=" parameter to create mapped devices at init time. This option is useful to allow mounting rootfs without requiring an initramfs. - See Documentation/device-mapper/dm-init.txt for dm-mod.create="..." + See Documentation/device-mapper/dm-init.rst for dm-mod.create="..." format. If unsure, say N. diff --git a/drivers/md/dm-init.c b/drivers/md/dm-init.c index 352e803f566e..a58d0944f592 100644 --- a/drivers/md/dm-init.c +++ b/drivers/md/dm-init.c @@ -25,7 +25,7 @@ static char *create; * Format: dm-mod.create=,,,,
[,
+][;,,,,
[,
+]+] * Table format: * - * See Documentation/device-mapper/dm-init.txt for dm-mod.create="..." format + * See Documentation/device-mapper/dm-init.rst for dm-mod.create="..." format * details. */ diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c index 9fdef6897316..7a87a640f8ba 100644 --- a/drivers/md/dm-raid.c +++ b/drivers/md/dm-raid.c @@ -3558,7 +3558,7 @@ static void raid_status(struct dm_target *ti, status_type_t type, * v1.5.0+: * * Sync action: - * See Documentation/device-mapper/dm-raid.txt for + * See Documentation/device-mapper/dm-raid.rst for * information on each of these states. */ DMEMIT(" %s", sync_action); From 10ffebbed5503b1830c7920ef528075785351be6 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Wed, 12 Jun 2019 14:52:44 -0300 Subject: [PATCH 073/129] docs: fault-injection: convert docs to ReST and rename to *.rst The conversion is actually: - add blank lines and identation in order to identify paragraphs; - fix tables markups; - add some lists markups; - mark literal blocks; - adjust title markups. At its new index.rst, let's add a :orphan: while this is not linked to the main index.rst file, in order to avoid build warnings. Signed-off-by: Mauro Carvalho Chehab Acked-by: Federico Vaga Signed-off-by: Jonathan Corbet --- ...ault-injection.txt => fault-injection.rst} | 265 +++++++++--------- Documentation/fault-injection/index.rst | 20 ++ ...r-inject.txt => notifier-error-inject.rst} | 18 +- .../fault-injection/nvme-fault-injection.rst | 120 ++++++++ .../fault-injection/nvme-fault-injection.txt | 116 -------- .../fault-injection/provoke-crashes.rst | 48 ++++ .../fault-injection/provoke-crashes.txt | 38 --- Documentation/process/4.Coding.rst | 2 +- .../translations/it_IT/process/4.Coding.rst | 2 +- .../translations/zh_CN/process/4.Coding.rst | 2 +- drivers/misc/lkdtm/core.c | 2 +- include/linux/fault-inject.h | 2 +- lib/Kconfig.debug | 2 +- tools/testing/fault-injection/failcmd.sh | 2 +- 14 files changed, 344 insertions(+), 295 deletions(-) rename Documentation/fault-injection/{fault-injection.txt => fault-injection.rst} (68%) create mode 100644 Documentation/fault-injection/index.rst rename Documentation/fault-injection/{notifier-error-inject.txt => notifier-error-inject.rst} (83%) create mode 100644 Documentation/fault-injection/nvme-fault-injection.rst delete mode 100644 Documentation/fault-injection/nvme-fault-injection.txt create mode 100644 Documentation/fault-injection/provoke-crashes.rst delete mode 100644 Documentation/fault-injection/provoke-crashes.txt diff --git a/Documentation/fault-injection/fault-injection.txt b/Documentation/fault-injection/fault-injection.rst similarity index 68% rename from Documentation/fault-injection/fault-injection.txt rename to Documentation/fault-injection/fault-injection.rst index a17517a083c3..f51bb21d20e4 100644 --- a/Documentation/fault-injection/fault-injection.txt +++ b/Documentation/fault-injection/fault-injection.rst @@ -1,3 +1,4 @@ +=========================================== Fault injection capabilities infrastructure =========================================== @@ -7,36 +8,36 @@ See also drivers/md/md-faulty.c and "every_nth" module option for scsi_debug. Available fault injection capabilities -------------------------------------- -o failslab +- failslab injects slab allocation failures. (kmalloc(), kmem_cache_alloc(), ...) -o fail_page_alloc +- fail_page_alloc injects page allocation failures. (alloc_pages(), get_free_pages(), ...) -o fail_futex +- fail_futex injects futex deadlock and uaddr fault errors. -o fail_make_request +- fail_make_request injects disk IO errors on devices permitted by setting /sys/block//make-it-fail or /sys/block///make-it-fail. (generic_make_request()) -o fail_mmc_request +- fail_mmc_request injects MMC data errors on devices permitted by setting debugfs entries under /sys/kernel/debug/mmc0/fail_mmc_request -o fail_function +- fail_function injects error return on specific functions, which are marked by ALLOW_ERROR_INJECTION() macro, by setting debugfs entries under /sys/kernel/debug/fail_function. No boot option supported. -o NVMe fault injection +- NVMe fault injection inject NVMe status code and retry flag on devices permitted by setting debugfs entries under /sys/kernel/debug/nvme*/fault_inject. The default @@ -47,7 +48,8 @@ o NVMe fault injection Configure fault-injection capabilities behavior ----------------------------------------------- -o debugfs entries +debugfs entries +^^^^^^^^^^^^^^^ fault-inject-debugfs kernel module provides some debugfs entries for runtime configuration of fault-injection capabilities. @@ -55,6 +57,7 @@ configuration of fault-injection capabilities. - /sys/kernel/debug/fail*/probability: likelihood of failure injection, in percent. + Format: Note that one-failure-per-hundred is a very high error rate @@ -83,6 +86,7 @@ configuration of fault-injection capabilities. - /sys/kernel/debug/fail*/verbose Format: { 0 | 1 | 2 } + specifies the verbosity of the messages when failure is injected. '0' means no messages; '1' will print only a single log line per failure; '2' will print a call trace too -- useful @@ -91,14 +95,15 @@ configuration of fault-injection capabilities. - /sys/kernel/debug/fail*/task-filter: Format: { 'Y' | 'N' } + A value of 'N' disables filtering by process (default). Any positive value limits failures to only processes indicated by /proc//make-it-fail==1. -- /sys/kernel/debug/fail*/require-start: -- /sys/kernel/debug/fail*/require-end: -- /sys/kernel/debug/fail*/reject-start: -- /sys/kernel/debug/fail*/reject-end: +- /sys/kernel/debug/fail*/require-start, + /sys/kernel/debug/fail*/require-end, + /sys/kernel/debug/fail*/reject-start, + /sys/kernel/debug/fail*/reject-end: specifies the range of virtual addresses tested during stacktrace walking. Failure is injected only if some caller @@ -116,6 +121,7 @@ configuration of fault-injection capabilities. - /sys/kernel/debug/fail_page_alloc/ignore-gfp-highmem: Format: { 'Y' | 'N' } + default is 'N', setting it to 'Y' won't inject failures into highmem/user allocations. @@ -123,6 +129,7 @@ configuration of fault-injection capabilities. - /sys/kernel/debug/fail_page_alloc/ignore-gfp-wait: Format: { 'Y' | 'N' } + default is 'N', setting it to 'Y' will inject failures only into non-sleep allocations (GFP_ATOMIC allocations). @@ -134,12 +141,14 @@ configuration of fault-injection capabilities. - /sys/kernel/debug/fail_futex/ignore-private: Format: { 'Y' | 'N' } + default is 'N', setting it to 'Y' will disable failure injections when dealing with private (address space) futexes. - /sys/kernel/debug/fail_function/inject: Format: { 'function-name' | '!function-name' | '' } + specifies the target function of error injection by name. If the function name leads '!' prefix, given function is removed from injection list. If nothing specified ('') @@ -160,10 +169,11 @@ configuration of fault-injection capabilities. function for given function. This will be created when user specifies new injection entry. -o Boot option +Boot option +^^^^^^^^^^^ In order to inject faults while debugfs is not available (early boot time), -use the boot option: +use the boot option:: failslab= fail_page_alloc= @@ -171,10 +181,11 @@ use the boot option: fail_futex= mmc_core.fail_request=,,, -o proc entries +proc entries +^^^^^^^^^^^^ -- /proc//fail-nth: -- /proc/self/task//fail-nth: +- /proc//fail-nth, + /proc/self/task//fail-nth: Write to this file of integer N makes N-th call in the task fail. Read from this file returns a integer value. A value of '0' indicates @@ -191,16 +202,16 @@ o proc entries How to add new fault injection capability ----------------------------------------- -o #include +- #include -o define the fault attributes +- define the fault attributes DECLARE_FAULT_ATTR(name); Please see the definition of struct fault_attr in fault-inject.h for details. -o provide a way to configure fault attributes +- provide a way to configure fault attributes - boot option @@ -222,126 +233,126 @@ o provide a way to configure fault attributes single kernel module, it is better to provide module parameters to configure the fault attributes. -o add a hook to insert failures +- add a hook to insert failures - Upon should_fail() returning true, client code should inject a failure. + Upon should_fail() returning true, client code should inject a failure: should_fail(attr, size); Application Examples -------------------- -o Inject slab allocation failures into module init/exit code +- Inject slab allocation failures into module init/exit code:: -#!/bin/bash + #!/bin/bash -FAILTYPE=failslab -echo Y > /sys/kernel/debug/$FAILTYPE/task-filter -echo 10 > /sys/kernel/debug/$FAILTYPE/probability -echo 100 > /sys/kernel/debug/$FAILTYPE/interval -echo -1 > /sys/kernel/debug/$FAILTYPE/times -echo 0 > /sys/kernel/debug/$FAILTYPE/space -echo 2 > /sys/kernel/debug/$FAILTYPE/verbose -echo 1 > /sys/kernel/debug/$FAILTYPE/ignore-gfp-wait + FAILTYPE=failslab + echo Y > /sys/kernel/debug/$FAILTYPE/task-filter + echo 10 > /sys/kernel/debug/$FAILTYPE/probability + echo 100 > /sys/kernel/debug/$FAILTYPE/interval + echo -1 > /sys/kernel/debug/$FAILTYPE/times + echo 0 > /sys/kernel/debug/$FAILTYPE/space + echo 2 > /sys/kernel/debug/$FAILTYPE/verbose + echo 1 > /sys/kernel/debug/$FAILTYPE/ignore-gfp-wait -faulty_system() -{ + faulty_system() + { bash -c "echo 1 > /proc/self/make-it-fail && exec $*" -} + } -if [ $# -eq 0 ] -then + if [ $# -eq 0 ] + then echo "Usage: $0 modulename [ modulename ... ]" exit 1 -fi + fi -for m in $* -do + for m in $* + do echo inserting $m... faulty_system modprobe $m echo removing $m... faulty_system modprobe -r $m -done + done ------------------------------------------------------------------------------ -o Inject page allocation failures only for a specific module +- Inject page allocation failures only for a specific module:: -#!/bin/bash + #!/bin/bash -FAILTYPE=fail_page_alloc -module=$1 + FAILTYPE=fail_page_alloc + module=$1 -if [ -z $module ] -then + if [ -z $module ] + then echo "Usage: $0 " exit 1 -fi + fi -modprobe $module + modprobe $module -if [ ! -d /sys/module/$module/sections ] -then + if [ ! -d /sys/module/$module/sections ] + then echo Module $module is not loaded exit 1 -fi + fi -cat /sys/module/$module/sections/.text > /sys/kernel/debug/$FAILTYPE/require-start -cat /sys/module/$module/sections/.data > /sys/kernel/debug/$FAILTYPE/require-end + cat /sys/module/$module/sections/.text > /sys/kernel/debug/$FAILTYPE/require-start + cat /sys/module/$module/sections/.data > /sys/kernel/debug/$FAILTYPE/require-end -echo N > /sys/kernel/debug/$FAILTYPE/task-filter -echo 10 > /sys/kernel/debug/$FAILTYPE/probability -echo 100 > /sys/kernel/debug/$FAILTYPE/interval -echo -1 > /sys/kernel/debug/$FAILTYPE/times -echo 0 > /sys/kernel/debug/$FAILTYPE/space -echo 2 > /sys/kernel/debug/$FAILTYPE/verbose -echo 1 > /sys/kernel/debug/$FAILTYPE/ignore-gfp-wait -echo 1 > /sys/kernel/debug/$FAILTYPE/ignore-gfp-highmem -echo 10 > /sys/kernel/debug/$FAILTYPE/stacktrace-depth + echo N > /sys/kernel/debug/$FAILTYPE/task-filter + echo 10 > /sys/kernel/debug/$FAILTYPE/probability + echo 100 > /sys/kernel/debug/$FAILTYPE/interval + echo -1 > /sys/kernel/debug/$FAILTYPE/times + echo 0 > /sys/kernel/debug/$FAILTYPE/space + echo 2 > /sys/kernel/debug/$FAILTYPE/verbose + echo 1 > /sys/kernel/debug/$FAILTYPE/ignore-gfp-wait + echo 1 > /sys/kernel/debug/$FAILTYPE/ignore-gfp-highmem + echo 10 > /sys/kernel/debug/$FAILTYPE/stacktrace-depth -trap "echo 0 > /sys/kernel/debug/$FAILTYPE/probability" SIGINT SIGTERM EXIT + trap "echo 0 > /sys/kernel/debug/$FAILTYPE/probability" SIGINT SIGTERM EXIT -echo "Injecting errors into the module $module... (interrupt to stop)" -sleep 1000000 + echo "Injecting errors into the module $module... (interrupt to stop)" + sleep 1000000 ------------------------------------------------------------------------------ -o Inject open_ctree error while btrfs mount +- Inject open_ctree error while btrfs mount:: -#!/bin/bash + #!/bin/bash -rm -f testfile.img -dd if=/dev/zero of=testfile.img bs=1M seek=1000 count=1 -DEVICE=$(losetup --show -f testfile.img) -mkfs.btrfs -f $DEVICE -mkdir -p tmpmnt + rm -f testfile.img + dd if=/dev/zero of=testfile.img bs=1M seek=1000 count=1 + DEVICE=$(losetup --show -f testfile.img) + mkfs.btrfs -f $DEVICE + mkdir -p tmpmnt -FAILTYPE=fail_function -FAILFUNC=open_ctree -echo $FAILFUNC > /sys/kernel/debug/$FAILTYPE/inject -echo -12 > /sys/kernel/debug/$FAILTYPE/$FAILFUNC/retval -echo N > /sys/kernel/debug/$FAILTYPE/task-filter -echo 100 > /sys/kernel/debug/$FAILTYPE/probability -echo 0 > /sys/kernel/debug/$FAILTYPE/interval -echo -1 > /sys/kernel/debug/$FAILTYPE/times -echo 0 > /sys/kernel/debug/$FAILTYPE/space -echo 1 > /sys/kernel/debug/$FAILTYPE/verbose + FAILTYPE=fail_function + FAILFUNC=open_ctree + echo $FAILFUNC > /sys/kernel/debug/$FAILTYPE/inject + echo -12 > /sys/kernel/debug/$FAILTYPE/$FAILFUNC/retval + echo N > /sys/kernel/debug/$FAILTYPE/task-filter + echo 100 > /sys/kernel/debug/$FAILTYPE/probability + echo 0 > /sys/kernel/debug/$FAILTYPE/interval + echo -1 > /sys/kernel/debug/$FAILTYPE/times + echo 0 > /sys/kernel/debug/$FAILTYPE/space + echo 1 > /sys/kernel/debug/$FAILTYPE/verbose -mount -t btrfs $DEVICE tmpmnt -if [ $? -ne 0 ] -then + mount -t btrfs $DEVICE tmpmnt + if [ $? -ne 0 ] + then echo "SUCCESS!" -else + else echo "FAILED!" umount tmpmnt -fi + fi -echo > /sys/kernel/debug/$FAILTYPE/inject + echo > /sys/kernel/debug/$FAILTYPE/inject -rmdir tmpmnt -losetup -d $DEVICE -rm testfile.img + rmdir tmpmnt + losetup -d $DEVICE + rm testfile.img Tool to run command with failslab or fail_page_alloc @@ -354,43 +365,43 @@ see the following examples. Examples: Run a command "make -C tools/testing/selftests/ run_tests" with injecting slab -allocation failure. +allocation failure:: # ./tools/testing/fault-injection/failcmd.sh \ -- make -C tools/testing/selftests/ run_tests Same as above except to specify 100 times failures at most instead of one time -at most by default. +at most by default:: # ./tools/testing/fault-injection/failcmd.sh --times=100 \ -- make -C tools/testing/selftests/ run_tests Same as above except to inject page allocation failure instead of slab -allocation failure. +allocation failure:: # env FAILCMD_TYPE=fail_page_alloc \ ./tools/testing/fault-injection/failcmd.sh --times=100 \ - -- make -C tools/testing/selftests/ run_tests + -- make -C tools/testing/selftests/ run_tests Systematic faults using fail-nth --------------------------------- The following code systematically faults 0-th, 1-st, 2-nd and so on -capabilities in the socketpair() system call. +capabilities in the socketpair() system call:: -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include + #include + #include + #include + #include + #include + #include + #include + #include + #include + #include -int main() -{ + int main() + { int i, err, res, fail_nth, fds[2]; char buf[128]; @@ -413,23 +424,23 @@ int main() break; } return 0; -} + } -An example output: +An example output:: -1-th fault Y: res=-1/23 -2-th fault Y: res=-1/23 -3-th fault Y: res=-1/12 -4-th fault Y: res=-1/12 -5-th fault Y: res=-1/23 -6-th fault Y: res=-1/23 -7-th fault Y: res=-1/23 -8-th fault Y: res=-1/12 -9-th fault Y: res=-1/12 -10-th fault Y: res=-1/12 -11-th fault Y: res=-1/12 -12-th fault Y: res=-1/12 -13-th fault Y: res=-1/12 -14-th fault Y: res=-1/12 -15-th fault Y: res=-1/12 -16-th fault N: res=0/12 + 1-th fault Y: res=-1/23 + 2-th fault Y: res=-1/23 + 3-th fault Y: res=-1/12 + 4-th fault Y: res=-1/12 + 5-th fault Y: res=-1/23 + 6-th fault Y: res=-1/23 + 7-th fault Y: res=-1/23 + 8-th fault Y: res=-1/12 + 9-th fault Y: res=-1/12 + 10-th fault Y: res=-1/12 + 11-th fault Y: res=-1/12 + 12-th fault Y: res=-1/12 + 13-th fault Y: res=-1/12 + 14-th fault Y: res=-1/12 + 15-th fault Y: res=-1/12 + 16-th fault N: res=0/12 diff --git a/Documentation/fault-injection/index.rst b/Documentation/fault-injection/index.rst new file mode 100644 index 000000000000..92b5639ed07a --- /dev/null +++ b/Documentation/fault-injection/index.rst @@ -0,0 +1,20 @@ +:orphan: + +=============== +fault-injection +=============== + +.. toctree:: + :maxdepth: 1 + + fault-injection + notifier-error-inject + nvme-fault-injection + provoke-crashes + +.. only:: subproject and html + + Indices + ======= + + * :ref:`genindex` diff --git a/Documentation/fault-injection/notifier-error-inject.txt b/Documentation/fault-injection/notifier-error-inject.rst similarity index 83% rename from Documentation/fault-injection/notifier-error-inject.txt rename to Documentation/fault-injection/notifier-error-inject.rst index e861d761de24..1668b6e48d3a 100644 --- a/Documentation/fault-injection/notifier-error-inject.txt +++ b/Documentation/fault-injection/notifier-error-inject.rst @@ -14,7 +14,8 @@ modules that can be used to test the following notifiers. PM notifier error injection module ---------------------------------- This feature is controlled through debugfs interface -/sys/kernel/debug/notifier-error-inject/pm/actions//error + + /sys/kernel/debug/notifier-error-inject/pm/actions//error Possible PM notifier events to be failed are: @@ -22,7 +23,7 @@ Possible PM notifier events to be failed are: * PM_SUSPEND_PREPARE * PM_RESTORE_PREPARE -Example: Inject PM suspend error (-12 = -ENOMEM) +Example: Inject PM suspend error (-12 = -ENOMEM):: # cd /sys/kernel/debug/notifier-error-inject/pm/ # echo -12 > actions/PM_SUSPEND_PREPARE/error @@ -32,14 +33,15 @@ Example: Inject PM suspend error (-12 = -ENOMEM) Memory hotplug notifier error injection module ---------------------------------------------- This feature is controlled through debugfs interface -/sys/kernel/debug/notifier-error-inject/memory/actions//error + + /sys/kernel/debug/notifier-error-inject/memory/actions//error Possible memory notifier events to be failed are: * MEM_GOING_ONLINE * MEM_GOING_OFFLINE -Example: Inject memory hotplug offline error (-12 == -ENOMEM) +Example: Inject memory hotplug offline error (-12 == -ENOMEM):: # cd /sys/kernel/debug/notifier-error-inject/memory # echo -12 > actions/MEM_GOING_OFFLINE/error @@ -49,7 +51,8 @@ Example: Inject memory hotplug offline error (-12 == -ENOMEM) powerpc pSeries reconfig notifier error injection module -------------------------------------------------------- This feature is controlled through debugfs interface -/sys/kernel/debug/notifier-error-inject/pSeries-reconfig/actions//error + + /sys/kernel/debug/notifier-error-inject/pSeries-reconfig/actions//error Possible pSeries reconfig notifier events to be failed are: @@ -61,7 +64,8 @@ Possible pSeries reconfig notifier events to be failed are: Netdevice notifier error injection module ---------------------------------------------- This feature is controlled through debugfs interface -/sys/kernel/debug/notifier-error-inject/netdev/actions//error + + /sys/kernel/debug/notifier-error-inject/netdev/actions//error Netdevice notifier events which can be failed are: @@ -75,7 +79,7 @@ Netdevice notifier events which can be failed are: * NETDEV_PRECHANGEUPPER * NETDEV_CHANGEUPPER -Example: Inject netdevice mtu change error (-22 == -EINVAL) +Example: Inject netdevice mtu change error (-22 == -EINVAL):: # cd /sys/kernel/debug/notifier-error-inject/netdev # echo -22 > actions/NETDEV_CHANGEMTU/error diff --git a/Documentation/fault-injection/nvme-fault-injection.rst b/Documentation/fault-injection/nvme-fault-injection.rst new file mode 100644 index 000000000000..bbb1bf3e8650 --- /dev/null +++ b/Documentation/fault-injection/nvme-fault-injection.rst @@ -0,0 +1,120 @@ +NVMe Fault Injection +==================== +Linux's fault injection framework provides a systematic way to support +error injection via debugfs in the /sys/kernel/debug directory. When +enabled, the default NVME_SC_INVALID_OPCODE with no retry will be +injected into the nvme_end_request. Users can change the default status +code and no retry flag via the debugfs. The list of Generic Command +Status can be found in include/linux/nvme.h + +Following examples show how to inject an error into the nvme. + +First, enable CONFIG_FAULT_INJECTION_DEBUG_FS kernel config, +recompile the kernel. After booting up the kernel, do the +following. + +Example 1: Inject default status code with no retry +--------------------------------------------------- + +:: + + mount /dev/nvme0n1 /mnt + echo 1 > /sys/kernel/debug/nvme0n1/fault_inject/times + echo 100 > /sys/kernel/debug/nvme0n1/fault_inject/probability + cp a.file /mnt + +Expected Result:: + + cp: cannot stat ‘/mnt/a.file’: Input/output error + +Message from dmesg:: + + FAULT_INJECTION: forcing a failure. + name fault_inject, interval 1, probability 100, space 0, times 1 + CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.15.0-rc8+ #2 + Hardware name: innotek GmbH VirtualBox/VirtualBox, + BIOS VirtualBox 12/01/2006 + Call Trace: + + dump_stack+0x5c/0x7d + should_fail+0x148/0x170 + nvme_should_fail+0x2f/0x50 [nvme_core] + nvme_process_cq+0xe7/0x1d0 [nvme] + nvme_irq+0x1e/0x40 [nvme] + __handle_irq_event_percpu+0x3a/0x190 + handle_irq_event_percpu+0x30/0x70 + handle_irq_event+0x36/0x60 + handle_fasteoi_irq+0x78/0x120 + handle_irq+0xa7/0x130 + ? tick_irq_enter+0xa8/0xc0 + do_IRQ+0x43/0xc0 + common_interrupt+0xa2/0xa2 + + RIP: 0010:native_safe_halt+0x2/0x10 + RSP: 0018:ffffffff82003e90 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdd + RAX: ffffffff817a10c0 RBX: ffffffff82012480 RCX: 0000000000000000 + RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 + RBP: 0000000000000000 R08: 000000008e38ce64 R09: 0000000000000000 + R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff82012480 + R13: ffffffff82012480 R14: 0000000000000000 R15: 0000000000000000 + ? __sched_text_end+0x4/0x4 + default_idle+0x18/0xf0 + do_idle+0x150/0x1d0 + cpu_startup_entry+0x6f/0x80 + start_kernel+0x4c4/0x4e4 + ? set_init_arg+0x55/0x55 + secondary_startup_64+0xa5/0xb0 + print_req_error: I/O error, dev nvme0n1, sector 9240 + EXT4-fs error (device nvme0n1): ext4_find_entry:1436: + inode #2: comm cp: reading directory lblock 0 + +Example 2: Inject default status code with retry +------------------------------------------------ + +:: + + mount /dev/nvme0n1 /mnt + echo 1 > /sys/kernel/debug/nvme0n1/fault_inject/times + echo 100 > /sys/kernel/debug/nvme0n1/fault_inject/probability + echo 1 > /sys/kernel/debug/nvme0n1/fault_inject/status + echo 0 > /sys/kernel/debug/nvme0n1/fault_inject/dont_retry + + cp a.file /mnt + +Expected Result:: + + command success without error + +Message from dmesg:: + + FAULT_INJECTION: forcing a failure. + name fault_inject, interval 1, probability 100, space 0, times 1 + CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.15.0-rc8+ #4 + Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 + Call Trace: + + dump_stack+0x5c/0x7d + should_fail+0x148/0x170 + nvme_should_fail+0x30/0x60 [nvme_core] + nvme_loop_queue_response+0x84/0x110 [nvme_loop] + nvmet_req_complete+0x11/0x40 [nvmet] + nvmet_bio_done+0x28/0x40 [nvmet] + blk_update_request+0xb0/0x310 + blk_mq_end_request+0x18/0x60 + flush_smp_call_function_queue+0x3d/0xf0 + smp_call_function_single_interrupt+0x2c/0xc0 + call_function_single_interrupt+0xa2/0xb0 + + RIP: 0010:native_safe_halt+0x2/0x10 + RSP: 0018:ffffc9000068bec0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04 + RAX: ffffffff817a10c0 RBX: ffff88011a3c9680 RCX: 0000000000000000 + RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 + RBP: 0000000000000001 R08: 000000008e38c131 R09: 0000000000000000 + R10: 0000000000000000 R11: 0000000000000000 R12: ffff88011a3c9680 + R13: ffff88011a3c9680 R14: 0000000000000000 R15: 0000000000000000 + ? __sched_text_end+0x4/0x4 + default_idle+0x18/0xf0 + do_idle+0x150/0x1d0 + cpu_startup_entry+0x6f/0x80 + start_secondary+0x187/0x1e0 + secondary_startup_64+0xa5/0xb0 diff --git a/Documentation/fault-injection/nvme-fault-injection.txt b/Documentation/fault-injection/nvme-fault-injection.txt deleted file mode 100644 index 8fbf3bf60b62..000000000000 --- a/Documentation/fault-injection/nvme-fault-injection.txt +++ /dev/null @@ -1,116 +0,0 @@ -NVMe Fault Injection -==================== -Linux's fault injection framework provides a systematic way to support -error injection via debugfs in the /sys/kernel/debug directory. When -enabled, the default NVME_SC_INVALID_OPCODE with no retry will be -injected into the nvme_end_request. Users can change the default status -code and no retry flag via the debugfs. The list of Generic Command -Status can be found in include/linux/nvme.h - -Following examples show how to inject an error into the nvme. - -First, enable CONFIG_FAULT_INJECTION_DEBUG_FS kernel config, -recompile the kernel. After booting up the kernel, do the -following. - -Example 1: Inject default status code with no retry ---------------------------------------------------- - -mount /dev/nvme0n1 /mnt -echo 1 > /sys/kernel/debug/nvme0n1/fault_inject/times -echo 100 > /sys/kernel/debug/nvme0n1/fault_inject/probability -cp a.file /mnt - -Expected Result: - -cp: cannot stat ‘/mnt/a.file’: Input/output error - -Message from dmesg: - -FAULT_INJECTION: forcing a failure. -name fault_inject, interval 1, probability 100, space 0, times 1 -CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.15.0-rc8+ #2 -Hardware name: innotek GmbH VirtualBox/VirtualBox, -BIOS VirtualBox 12/01/2006 -Call Trace: - - dump_stack+0x5c/0x7d - should_fail+0x148/0x170 - nvme_should_fail+0x2f/0x50 [nvme_core] - nvme_process_cq+0xe7/0x1d0 [nvme] - nvme_irq+0x1e/0x40 [nvme] - __handle_irq_event_percpu+0x3a/0x190 - handle_irq_event_percpu+0x30/0x70 - handle_irq_event+0x36/0x60 - handle_fasteoi_irq+0x78/0x120 - handle_irq+0xa7/0x130 - ? tick_irq_enter+0xa8/0xc0 - do_IRQ+0x43/0xc0 - common_interrupt+0xa2/0xa2 - -RIP: 0010:native_safe_halt+0x2/0x10 -RSP: 0018:ffffffff82003e90 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdd -RAX: ffffffff817a10c0 RBX: ffffffff82012480 RCX: 0000000000000000 -RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 -RBP: 0000000000000000 R08: 000000008e38ce64 R09: 0000000000000000 -R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff82012480 -R13: ffffffff82012480 R14: 0000000000000000 R15: 0000000000000000 - ? __sched_text_end+0x4/0x4 - default_idle+0x18/0xf0 - do_idle+0x150/0x1d0 - cpu_startup_entry+0x6f/0x80 - start_kernel+0x4c4/0x4e4 - ? set_init_arg+0x55/0x55 - secondary_startup_64+0xa5/0xb0 - print_req_error: I/O error, dev nvme0n1, sector 9240 -EXT4-fs error (device nvme0n1): ext4_find_entry:1436: -inode #2: comm cp: reading directory lblock 0 - -Example 2: Inject default status code with retry ------------------------------------------------- - -mount /dev/nvme0n1 /mnt -echo 1 > /sys/kernel/debug/nvme0n1/fault_inject/times -echo 100 > /sys/kernel/debug/nvme0n1/fault_inject/probability -echo 1 > /sys/kernel/debug/nvme0n1/fault_inject/status -echo 0 > /sys/kernel/debug/nvme0n1/fault_inject/dont_retry - -cp a.file /mnt - -Expected Result: - -command success without error - -Message from dmesg: - -FAULT_INJECTION: forcing a failure. -name fault_inject, interval 1, probability 100, space 0, times 1 -CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.15.0-rc8+ #4 -Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 -Call Trace: - - dump_stack+0x5c/0x7d - should_fail+0x148/0x170 - nvme_should_fail+0x30/0x60 [nvme_core] - nvme_loop_queue_response+0x84/0x110 [nvme_loop] - nvmet_req_complete+0x11/0x40 [nvmet] - nvmet_bio_done+0x28/0x40 [nvmet] - blk_update_request+0xb0/0x310 - blk_mq_end_request+0x18/0x60 - flush_smp_call_function_queue+0x3d/0xf0 - smp_call_function_single_interrupt+0x2c/0xc0 - call_function_single_interrupt+0xa2/0xb0 - -RIP: 0010:native_safe_halt+0x2/0x10 -RSP: 0018:ffffc9000068bec0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04 -RAX: ffffffff817a10c0 RBX: ffff88011a3c9680 RCX: 0000000000000000 -RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 -RBP: 0000000000000001 R08: 000000008e38c131 R09: 0000000000000000 -R10: 0000000000000000 R11: 0000000000000000 R12: ffff88011a3c9680 -R13: ffff88011a3c9680 R14: 0000000000000000 R15: 0000000000000000 - ? __sched_text_end+0x4/0x4 - default_idle+0x18/0xf0 - do_idle+0x150/0x1d0 - cpu_startup_entry+0x6f/0x80 - start_secondary+0x187/0x1e0 - secondary_startup_64+0xa5/0xb0 diff --git a/Documentation/fault-injection/provoke-crashes.rst b/Documentation/fault-injection/provoke-crashes.rst new file mode 100644 index 000000000000..9279a3e12278 --- /dev/null +++ b/Documentation/fault-injection/provoke-crashes.rst @@ -0,0 +1,48 @@ +=============== +Provoke crashes +=============== + +The lkdtm module provides an interface to crash or injure the kernel at +predefined crashpoints to evaluate the reliability of crash dumps obtained +using different dumping solutions. The module uses KPROBEs to instrument +crashing points, but can also crash the kernel directly without KRPOBE +support. + + +You can provide the way either through module arguments when inserting +the module, or through a debugfs interface. + +Usage:: + + insmod lkdtm.ko [recur_count={>0}] cpoint_name=<> cpoint_type=<> + [cpoint_count={>0}] + +recur_count + Recursion level for the stack overflow test. Default is 10. + +cpoint_name + Crash point where the kernel is to be crashed. It can be + one of INT_HARDWARE_ENTRY, INT_HW_IRQ_EN, INT_TASKLET_ENTRY, + FS_DEVRW, MEM_SWAPOUT, TIMERADD, SCSI_DISPATCH_CMD, + IDE_CORE_CP, DIRECT + +cpoint_type + Indicates the action to be taken on hitting the crash point. + It can be one of PANIC, BUG, EXCEPTION, LOOP, OVERFLOW, + CORRUPT_STACK, UNALIGNED_LOAD_STORE_WRITE, OVERWRITE_ALLOCATION, + WRITE_AFTER_FREE, + +cpoint_count + Indicates the number of times the crash point is to be hit + to trigger an action. The default is 10. + +You can also induce failures by mounting debugfs and writing the type to +/provoke-crash/. E.g.:: + + mount -t debugfs debugfs /mnt + echo EXCEPTION > /mnt/provoke-crash/INT_HARDWARE_ENTRY + + +A special file is `DIRECT` which will induce the crash directly without +KPROBE instrumentation. This mode is the only one available when the module +is built on a kernel without KPROBEs support. diff --git a/Documentation/fault-injection/provoke-crashes.txt b/Documentation/fault-injection/provoke-crashes.txt deleted file mode 100644 index 7a9d3d81525b..000000000000 --- a/Documentation/fault-injection/provoke-crashes.txt +++ /dev/null @@ -1,38 +0,0 @@ -The lkdtm module provides an interface to crash or injure the kernel at -predefined crashpoints to evaluate the reliability of crash dumps obtained -using different dumping solutions. The module uses KPROBEs to instrument -crashing points, but can also crash the kernel directly without KRPOBE -support. - - -You can provide the way either through module arguments when inserting -the module, or through a debugfs interface. - -Usage: insmod lkdtm.ko [recur_count={>0}] cpoint_name=<> cpoint_type=<> - [cpoint_count={>0}] - - recur_count : Recursion level for the stack overflow test. Default is 10. - - cpoint_name : Crash point where the kernel is to be crashed. It can be - one of INT_HARDWARE_ENTRY, INT_HW_IRQ_EN, INT_TASKLET_ENTRY, - FS_DEVRW, MEM_SWAPOUT, TIMERADD, SCSI_DISPATCH_CMD, - IDE_CORE_CP, DIRECT - - cpoint_type : Indicates the action to be taken on hitting the crash point. - It can be one of PANIC, BUG, EXCEPTION, LOOP, OVERFLOW, - CORRUPT_STACK, UNALIGNED_LOAD_STORE_WRITE, OVERWRITE_ALLOCATION, - WRITE_AFTER_FREE, - - cpoint_count : Indicates the number of times the crash point is to be hit - to trigger an action. The default is 10. - -You can also induce failures by mounting debugfs and writing the type to -/provoke-crash/. E.g., - - mount -t debugfs debugfs /mnt - echo EXCEPTION > /mnt/provoke-crash/INT_HARDWARE_ENTRY - - -A special file is `DIRECT' which will induce the crash directly without -KPROBE instrumentation. This mode is the only one available when the module -is built on a kernel without KPROBEs support. diff --git a/Documentation/process/4.Coding.rst b/Documentation/process/4.Coding.rst index 4b7a5ab3cec1..13dd893c9f88 100644 --- a/Documentation/process/4.Coding.rst +++ b/Documentation/process/4.Coding.rst @@ -298,7 +298,7 @@ enabled, a configurable percentage of memory allocations will be made to fail; these failures can be restricted to a specific range of code. Running with fault injection enabled allows the programmer to see how the code responds when things go badly. See -Documentation/fault-injection/fault-injection.txt for more information on +Documentation/fault-injection/fault-injection.rst for more information on how to use this facility. Other kinds of errors can be found with the "sparse" static analysis tool. diff --git a/Documentation/translations/it_IT/process/4.Coding.rst b/Documentation/translations/it_IT/process/4.Coding.rst index c05b89e616dd..a5e36aa60448 100644 --- a/Documentation/translations/it_IT/process/4.Coding.rst +++ b/Documentation/translations/it_IT/process/4.Coding.rst @@ -314,7 +314,7 @@ di allocazione di memoria sarà destinata al fallimento; questi fallimenti possono essere ridotti ad uno specifico pezzo di codice. Procedere con l'inserimento dei fallimenti attivo permette al programmatore di verificare come il codice risponde quando le cose vanno male. Consultate: -Documentation/fault-injection/fault-injection.txt per avere maggiori +Documentation/fault-injection/fault-injection.rst per avere maggiori informazioni su come utilizzare questo strumento. Altre tipologie di errori possono essere riscontrati con lo strumento di diff --git a/Documentation/translations/zh_CN/process/4.Coding.rst b/Documentation/translations/zh_CN/process/4.Coding.rst index 8bb777941394..b82b1dde3122 100644 --- a/Documentation/translations/zh_CN/process/4.Coding.rst +++ b/Documentation/translations/zh_CN/process/4.Coding.rst @@ -205,7 +205,7 @@ Linus对这个问题给出了最佳答案: 启用故障注入后,内存分配的可配置百分比将失败;这些失败可以限制在特定的代码 范围内。在启用了故障注入的情况下运行,程序员可以看到当情况恶化时代码如何响 应。有关如何使用此工具的详细信息,请参阅 -Documentation/fault-injection/fault-injection.txt。 +Documentation/fault-injection/fault-injection.rst。 使用“sparse”静态分析工具可以发现其他类型的错误。对于sparse,可以警告程序员 用户空间和内核空间地址之间的混淆、big endian和small endian数量的混合、在需 diff --git a/drivers/misc/lkdtm/core.c b/drivers/misc/lkdtm/core.c index 8a1428d4f138..bba49abb6750 100644 --- a/drivers/misc/lkdtm/core.c +++ b/drivers/misc/lkdtm/core.c @@ -15,7 +15,7 @@ * * Debugfs support added by Simon Kagstrom * - * See Documentation/fault-injection/provoke-crashes.txt for instructions + * See Documentation/fault-injection/provoke-crashes.rst for instructions */ #include "lkdtm.h" #include diff --git a/include/linux/fault-inject.h b/include/linux/fault-inject.h index 7e6c77740413..e525f6957c49 100644 --- a/include/linux/fault-inject.h +++ b/include/linux/fault-inject.h @@ -11,7 +11,7 @@ /* * For explanation of the elements of this struct, see - * Documentation/fault-injection/fault-injection.txt + * Documentation/fault-injection/fault-injection.rst */ struct fault_attr { unsigned long probability; diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index cbdfae379896..4d42a9a6006d 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1701,7 +1701,7 @@ config LKDTM called lkdtm. Documentation on how to use the module can be found in - Documentation/fault-injection/provoke-crashes.txt + Documentation/fault-injection/provoke-crashes.rst config TEST_LIST_SORT tristate "Linked list sorting test" diff --git a/tools/testing/fault-injection/failcmd.sh b/tools/testing/fault-injection/failcmd.sh index 29a6c63c5a15..78dac34264be 100644 --- a/tools/testing/fault-injection/failcmd.sh +++ b/tools/testing/fault-injection/failcmd.sh @@ -42,7 +42,7 @@ OPTIONS --interval=value, --space=value, --verbose=value, --task-filter=value, --stacktrace-depth=value, --require-start=value, --require-end=value, --reject-start=value, --reject-end=value, --ignore-gfp-wait=value - See Documentation/fault-injection/fault-injection.txt for more + See Documentation/fault-injection/fault-injection.rst for more information failslab options: From ab42b818954c040fa13639dc031d8541edcecb4b Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Wed, 12 Jun 2019 14:52:45 -0300 Subject: [PATCH 074/129] docs: fb: convert docs to ReST and rename to *.rst The conversion is actually: - add blank lines and identation in order to identify paragraphs; - fix tables markups; - add some lists markups; - mark literal blocks; - adjust title markups. At its new index.rst, let's add a :orphan: while this is not linked to the main index.rst file, in order to avoid build warnings. Also, removed the Maintained by, as requested by Geert. Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Jonathan Corbet --- .../admin-guide/kernel-parameters.txt | 2 +- Documentation/fb/{api.txt => api.rst} | 29 +- Documentation/fb/{arkfb.txt => arkfb.rst} | 8 +- .../fb/{aty128fb.txt => aty128fb.rst} | 35 +- .../fb/{cirrusfb.txt => cirrusfb.rst} | 47 +- .../fb/{cmap_xfbdev.txt => cmap_xfbdev.rst} | 59 +-- .../fb/{deferred_io.txt => deferred_io.rst} | 28 +- Documentation/fb/{efifb.txt => efifb.rst} | 18 +- .../fb/{ep93xx-fb.txt => ep93xx-fb.rst} | 27 +- Documentation/fb/{fbcon.txt => fbcon.rst} | 177 +++---- .../fb/{framebuffer.txt => framebuffer.rst} | 80 ++-- Documentation/fb/{gxfb.txt => gxfb.rst} | 24 +- Documentation/fb/index.rst | 50 ++ .../fb/{intel810.txt => intel810.rst} | 79 ++-- Documentation/fb/{intelfb.txt => intelfb.rst} | 62 +-- .../fb/{internals.txt => internals.rst} | 24 +- Documentation/fb/{lxfb.txt => lxfb.rst} | 25 +- Documentation/fb/matroxfb.rst | 443 ++++++++++++++++++ Documentation/fb/matroxfb.txt | 413 ---------------- .../fb/{metronomefb.txt => metronomefb.rst} | 8 +- Documentation/fb/{modedb.txt => modedb.rst} | 44 +- Documentation/fb/pvr2fb.rst | 66 +++ Documentation/fb/pvr2fb.txt | 65 --- Documentation/fb/{pxafb.txt => pxafb.rst} | 81 +++- Documentation/fb/{s3fb.txt => s3fb.rst} | 8 +- .../fb/{sa1100fb.txt => sa1100fb.rst} | 23 +- Documentation/fb/sh7760fb.rst | 130 +++++ Documentation/fb/sh7760fb.txt | 131 ------ Documentation/fb/{sisfb.txt => sisfb.rst} | 40 +- Documentation/fb/{sm501.txt => sm501.rst} | 7 +- Documentation/fb/{sm712fb.txt => sm712fb.rst} | 18 +- Documentation/fb/sstfb.rst | 207 ++++++++ Documentation/fb/sstfb.txt | 174 ------- Documentation/fb/{tgafb.txt => tgafb.rst} | 30 +- .../fb/{tridentfb.txt => tridentfb.rst} | 36 +- Documentation/fb/{udlfb.txt => udlfb.rst} | 55 ++- Documentation/fb/{uvesafb.txt => uvesafb.rst} | 128 ++--- Documentation/fb/{vesafb.txt => vesafb.rst} | 121 ++--- Documentation/fb/viafb.rst | 297 ++++++++++++ Documentation/fb/viafb.txt | 252 ---------- .../fb/{vt8623fb.txt => vt8623fb.rst} | 10 +- MAINTAINERS | 10 +- drivers/tty/Kconfig | 2 +- drivers/video/fbdev/Kconfig | 24 +- drivers/video/fbdev/matrox/matroxfb_base.c | 2 +- drivers/video/fbdev/pxafb.c | 2 +- drivers/video/fbdev/sh7760fb.c | 2 +- 47 files changed, 1945 insertions(+), 1658 deletions(-) rename Documentation/fb/{api.txt => api.rst} (97%) rename Documentation/fb/{arkfb.txt => arkfb.rst} (92%) rename Documentation/fb/{aty128fb.txt => aty128fb.rst} (61%) rename Documentation/fb/{cirrusfb.txt => cirrusfb.rst} (75%) rename Documentation/fb/{cmap_xfbdev.txt => cmap_xfbdev.rst} (50%) rename Documentation/fb/{deferred_io.txt => deferred_io.rst} (86%) rename Documentation/fb/{efifb.txt => efifb.rst} (75%) rename Documentation/fb/{ep93xx-fb.txt => ep93xx-fb.rst} (85%) rename Documentation/fb/{fbcon.txt => fbcon.rst} (69%) rename Documentation/fb/{framebuffer.txt => framebuffer.rst} (92%) rename Documentation/fb/{gxfb.txt => gxfb.rst} (60%) create mode 100644 Documentation/fb/index.rst rename Documentation/fb/{intel810.txt => intel810.rst} (83%) rename Documentation/fb/{intelfb.txt => intelfb.rst} (73%) rename Documentation/fb/{internals.txt => internals.rst} (82%) rename Documentation/fb/{lxfb.txt => lxfb.rst} (60%) create mode 100644 Documentation/fb/matroxfb.rst delete mode 100644 Documentation/fb/matroxfb.txt rename Documentation/fb/{metronomefb.txt => metronomefb.rst} (98%) rename Documentation/fb/{modedb.txt => modedb.rst} (87%) create mode 100644 Documentation/fb/pvr2fb.rst delete mode 100644 Documentation/fb/pvr2fb.txt rename Documentation/fb/{pxafb.txt => pxafb.rst} (78%) rename Documentation/fb/{s3fb.txt => s3fb.rst} (94%) rename Documentation/fb/{sa1100fb.txt => sa1100fb.rst} (64%) create mode 100644 Documentation/fb/sh7760fb.rst delete mode 100644 Documentation/fb/sh7760fb.txt rename Documentation/fb/{sisfb.txt => sisfb.rst} (85%) rename Documentation/fb/{sm501.txt => sm501.rst} (65%) rename Documentation/fb/{sm712fb.txt => sm712fb.rst} (59%) create mode 100644 Documentation/fb/sstfb.rst delete mode 100644 Documentation/fb/sstfb.txt rename Documentation/fb/{tgafb.txt => tgafb.rst} (71%) rename Documentation/fb/{tridentfb.txt => tridentfb.rst} (70%) rename Documentation/fb/{udlfb.txt => udlfb.rst} (77%) rename Documentation/fb/{uvesafb.txt => uvesafb.rst} (52%) rename Documentation/fb/{vesafb.txt => vesafb.rst} (57%) create mode 100644 Documentation/fb/viafb.rst delete mode 100644 Documentation/fb/viafb.txt rename Documentation/fb/{vt8623fb.txt => vt8623fb.rst} (85%) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 9b16b640ce48..83d6560f10f0 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -5024,7 +5024,7 @@ vector=percpu: enable percpu vector domain video= [FB] Frame buffer configuration - See Documentation/fb/modedb.txt. + See Documentation/fb/modedb.rst. video.brightness_switch_enabled= [0,1] If set to 1, on receiving an ACPI notify event diff --git a/Documentation/fb/api.txt b/Documentation/fb/api.rst similarity index 97% rename from Documentation/fb/api.txt rename to Documentation/fb/api.rst index d52cf1e3b975..79ec33dded74 100644 --- a/Documentation/fb/api.txt +++ b/Documentation/fb/api.rst @@ -1,5 +1,6 @@ - The Frame Buffer Device API - --------------------------- +=========================== +The Frame Buffer Device API +=========================== Last revised: June 21, 2011 @@ -21,13 +22,13 @@ deal with different behaviours. --------------- Device and driver capabilities are reported in the fixed screen information -capabilities field. +capabilities field:: -struct fb_fix_screeninfo { + struct fb_fix_screeninfo { ... __u16 capabilities; /* see FB_CAP_* */ ... -}; + }; Application should use those capabilities to find out what features they can expect from the device and driver. @@ -151,9 +152,9 @@ fb_fix_screeninfo and fb_var_screeninfo structure respectively. struct fb_fix_screeninfo stores device independent unchangeable information about the frame buffer device and the current format. Those information can't be directly modified by applications, but can be changed by the driver when an -application modifies the format. +application modifies the format:: -struct fb_fix_screeninfo { + struct fb_fix_screeninfo { char id[16]; /* identification string eg "TT Builtin" */ unsigned long smem_start; /* Start of frame buffer mem */ /* (physical address) */ @@ -172,13 +173,13 @@ struct fb_fix_screeninfo { /* specific chip/card we have */ __u16 capabilities; /* see FB_CAP_* */ __u16 reserved[2]; /* Reserved for future compatibility */ -}; + }; struct fb_var_screeninfo stores device independent changeable information about a frame buffer device, its current format and video mode, as well as -other miscellaneous parameters. +other miscellaneous parameters:: -struct fb_var_screeninfo { + struct fb_var_screeninfo { __u32 xres; /* visible resolution */ __u32 yres; __u32 xres_virtual; /* virtual resolution */ @@ -216,7 +217,7 @@ struct fb_var_screeninfo { __u32 rotate; /* angle we rotate counter clockwise */ __u32 colorspace; /* colorspace for FOURCC-based modes */ __u32 reserved[4]; /* Reserved for future compatibility */ -}; + }; To modify variable information, applications call the FBIOPUT_VSCREENINFO ioctl with a pointer to a fb_var_screeninfo structure. If the call is @@ -255,14 +256,14 @@ monochrome, grayscale or pseudocolor visuals, although this is not required. - For truecolor and directcolor formats, applications set the grayscale field to zero, and the red, blue, green and transp fields to describe the layout of - color components in memory. + color components in memory:: -struct fb_bitfield { + struct fb_bitfield { __u32 offset; /* beginning of bitfield */ __u32 length; /* length of bitfield */ __u32 msb_right; /* != 0 : Most significant bit is */ /* right */ -}; + }; Pixel values are bits_per_pixel wide and are split in non-overlapping red, green, blue and alpha (transparency) components. Location and size of each diff --git a/Documentation/fb/arkfb.txt b/Documentation/fb/arkfb.rst similarity index 92% rename from Documentation/fb/arkfb.txt rename to Documentation/fb/arkfb.rst index e8487a9d6a05..aeca8773dd7e 100644 --- a/Documentation/fb/arkfb.txt +++ b/Documentation/fb/arkfb.rst @@ -1,6 +1,6 @@ - - arkfb - fbdev driver for ARK Logic chips - ======================================== +======================================== +arkfb - fbdev driver for ARK Logic chips +======================================== Supported Hardware @@ -47,7 +47,7 @@ Missing Features (alias TODO list) * secondary (not initialized by BIOS) device support - * big endian support + * big endian support * DPMS support * MMIO support * interlaced mode variant diff --git a/Documentation/fb/aty128fb.txt b/Documentation/fb/aty128fb.rst similarity index 61% rename from Documentation/fb/aty128fb.txt rename to Documentation/fb/aty128fb.rst index b605204fcfe1..3f107718f933 100644 --- a/Documentation/fb/aty128fb.txt +++ b/Documentation/fb/aty128fb.rst @@ -1,8 +1,9 @@ -[This file is cloned from VesaFB/matroxfb] - +================= What is aty128fb? ================= +.. [This file is cloned from VesaFB/matroxfb] + This is a driver for a graphic framebuffer for ATI Rage128 based devices on Intel and PPC boxes. @@ -24,15 +25,15 @@ How to use it? ============== Switching modes is done using the video=aty128fb:... modedb -boot parameter or using `fbset' program. +boot parameter or using `fbset` program. -See Documentation/fb/modedb.txt for more information on modedb +See Documentation/fb/modedb.rst for more information on modedb resolutions. You should compile in both vgacon (to boot if you remove your Rage128 from box) and aty128fb (for graphics mode). You should not compile-in vesafb -unless you have primary display on non-Rage128 VBE2.0 device (see -Documentation/fb/vesafb.txt for details). +unless you have primary display on non-Rage128 VBE2.0 device (see +Documentation/fb/vesafb.rst for details). X11 @@ -48,16 +49,18 @@ Configuration ============= You can pass kernel command line options to vesafb with -`video=aty128fb:option1,option2:value2,option3' (multiple options should -be separated by comma, values are separated from options by `:'). +`video=aty128fb:option1,option2:value2,option3` (multiple options should +be separated by comma, values are separated from options by `:`). Accepted options: -noaccel - do not use acceleration engine. It is default. -accel - use acceleration engine. Not finished. -vmode:x - chooses PowerMacintosh video mode . Deprecated. -cmode:x - chooses PowerMacintosh colour mode . Deprecated. - - selects startup videomode. See modedb.txt for detailed - explanation. Default is 640x480x8bpp. +========= ======================================================= +noaccel do not use acceleration engine. It is default. +accel use acceleration engine. Not finished. +vmode:x chooses PowerMacintosh video mode . Deprecated. +cmode:x chooses PowerMacintosh colour mode . Deprecated. + selects startup videomode. See modedb.txt for detailed + explanation. Default is 640x480x8bpp. +========= ======================================================= Limitations @@ -65,8 +68,8 @@ Limitations There are known and unknown bugs, features and misfeatures. Currently there are following known bugs: - + This driver is still experimental and is not finished. Too many + + - This driver is still experimental and is not finished. Too many bugs/errata to list here. --- Brad Douglas diff --git a/Documentation/fb/cirrusfb.txt b/Documentation/fb/cirrusfb.rst similarity index 75% rename from Documentation/fb/cirrusfb.txt rename to Documentation/fb/cirrusfb.rst index f75950d330a4..8c3e6c6cb114 100644 --- a/Documentation/fb/cirrusfb.txt +++ b/Documentation/fb/cirrusfb.rst @@ -1,32 +1,32 @@ +============================================ +Framebuffer driver for Cirrus Logic chipsets +============================================ - Framebuffer driver for Cirrus Logic chipsets - Copyright 1999 Jeff Garzik +Copyright 1999 Jeff Garzik - -{ just a little something to get people going; contributors welcome! } - +.. just a little something to get people going; contributors welcome! Chip families supported: - SD64 - Piccolo - Picasso - Spectrum - Alpine (GD-543x/4x) - Picasso4 (GD-5446) - GD-5480 - Laguna (GD-546x) + - SD64 + - Piccolo + - Picasso + - Spectrum + - Alpine (GD-543x/4x) + - Picasso4 (GD-5446) + - GD-5480 + - Laguna (GD-546x) Bus's supported: - PCI - Zorro + - PCI + - Zorro Architectures supported: - i386 - Alpha - PPC (Motorola Powerstack) - m68k (Amiga) + - i386 + - Alpha + - PPC (Motorola Powerstack) + - m68k (Amiga) @@ -34,10 +34,9 @@ Default video modes ------------------- At the moment, there are two kernel command line arguments supported: -mode:640x480 -mode:800x600 - or -mode:1024x768 +- mode:640x480 +- mode:800x600 +- mode:1024x768 Full support for startup video modes (modedb) will be integrated soon. @@ -93,5 +92,3 @@ Version 1.9.4 Version 1.9.3 ------------- * Bundled with kernel 2.3.14-pre1 or later. - - diff --git a/Documentation/fb/cmap_xfbdev.txt b/Documentation/fb/cmap_xfbdev.rst similarity index 50% rename from Documentation/fb/cmap_xfbdev.txt rename to Documentation/fb/cmap_xfbdev.rst index 55e1f0a3d2b4..5db5e9787361 100644 --- a/Documentation/fb/cmap_xfbdev.txt +++ b/Documentation/fb/cmap_xfbdev.rst @@ -1,26 +1,29 @@ +========================== Understanding fbdev's cmap --------------------------- +========================== These notes explain how X's dix layer uses fbdev's cmap structures. -*. example of relevant structures in fbdev as used for a 3-bit grayscale cmap -struct fb_var_screeninfo { - .bits_per_pixel = 8, - .grayscale = 1, - .red = { 4, 3, 0 }, - .green = { 0, 0, 0 }, - .blue = { 0, 0, 0 }, -} -struct fb_fix_screeninfo { - .visual = FB_VISUAL_STATIC_PSEUDOCOLOR, -} -for (i = 0; i < 8; i++) - info->cmap.red[i] = (((2*i)+1)*(0xFFFF))/16; -memcpy(info->cmap.green, info->cmap.red, sizeof(u16)*8); -memcpy(info->cmap.blue, info->cmap.red, sizeof(u16)*8); +- example of relevant structures in fbdev as used for a 3-bit grayscale cmap:: -*. X11 apps do something like the following when trying to use grayscale. -for (i=0; i < 8; i++) { + struct fb_var_screeninfo { + .bits_per_pixel = 8, + .grayscale = 1, + .red = { 4, 3, 0 }, + .green = { 0, 0, 0 }, + .blue = { 0, 0, 0 }, + } + struct fb_fix_screeninfo { + .visual = FB_VISUAL_STATIC_PSEUDOCOLOR, + } + for (i = 0; i < 8; i++) + info->cmap.red[i] = (((2*i)+1)*(0xFFFF))/16; + memcpy(info->cmap.green, info->cmap.red, sizeof(u16)*8); + memcpy(info->cmap.blue, info->cmap.red, sizeof(u16)*8); + +- X11 apps do something like the following when trying to use grayscale:: + + for (i=0; i < 8; i++) { char colorspec[64]; memset(colorspec,0,64); sprintf(colorspec, "rgb:%x/%x/%x", i*36,i*36,i*36); @@ -28,26 +31,26 @@ for (i=0; i < 8; i++) { printf("Can't get color %s\n",colorspec); XAllocColor(outputDisplay, testColormap, &wantedColor); grays[i] = wantedColor; -} + } + There's also named equivalents like gray1..x provided you have an rgb.txt. Somewhere in X's callchain, this results in a call to X code that handles the colormap. For example, Xfbdev hits the following: -xc-011010/programs/Xserver/dix/colormap.c: +xc-011010/programs/Xserver/dix/colormap.c:: -FindBestPixel(pentFirst, size, prgb, channel) + FindBestPixel(pentFirst, size, prgb, channel) -dr = (long) pent->co.local.red - prgb->red; -dg = (long) pent->co.local.green - prgb->green; -db = (long) pent->co.local.blue - prgb->blue; -sq = dr * dr; -UnsignedToBigNum (sq, &sum); -BigNumAdd (&sum, &temp, &sum); + dr = (long) pent->co.local.red - prgb->red; + dg = (long) pent->co.local.green - prgb->green; + db = (long) pent->co.local.blue - prgb->blue; + sq = dr * dr; + UnsignedToBigNum (sq, &sum); + BigNumAdd (&sum, &temp, &sum); co.local.red are entries that were brought in through FBIOGETCMAP which come directly from the info->cmap.red that was listed above. The prgb is the rgb that the app wants to match to. The above code is doing what looks like a least squares matching function. That's why the cmap entries can't be set to the left hand side boundaries of a color range. - diff --git a/Documentation/fb/deferred_io.txt b/Documentation/fb/deferred_io.rst similarity index 86% rename from Documentation/fb/deferred_io.txt rename to Documentation/fb/deferred_io.rst index 748328370250..7300cff255a3 100644 --- a/Documentation/fb/deferred_io.txt +++ b/Documentation/fb/deferred_io.rst @@ -1,5 +1,6 @@ +=========== Deferred IO ------------ +=========== Deferred IO is a way to delay and repurpose IO. It uses host memory as a buffer and the MMU pagefault as a pretrigger for when to perform the device @@ -16,7 +17,7 @@ works: - app continues writing to that page with no additional cost. this is the key benefit. - the workqueue task comes in and mkcleans the pages on the list, then - completes the work associated with updating the framebuffer. this is + completes the work associated with updating the framebuffer. this is the real work talking to the device. - app tries to write to the address (that has now been mkcleaned) - get pagefault and the above sequence occurs again @@ -47,29 +48,32 @@ How to use it: (for fbdev drivers) ---------------------------------- The following example may be helpful. -1. Setup your structure. Eg: +1. Setup your structure. Eg:: -static struct fb_deferred_io hecubafb_defio = { - .delay = HZ, - .deferred_io = hecubafb_dpy_deferred_io, -}; + static struct fb_deferred_io hecubafb_defio = { + .delay = HZ, + .deferred_io = hecubafb_dpy_deferred_io, + }; The delay is the minimum delay between when the page_mkwrite trigger occurs and when the deferred_io callback is called. The deferred_io callback is explained below. -2. Setup your deferred IO callback. Eg: -static void hecubafb_dpy_deferred_io(struct fb_info *info, - struct list_head *pagelist) +2. Setup your deferred IO callback. Eg:: + + static void hecubafb_dpy_deferred_io(struct fb_info *info, + struct list_head *pagelist) The deferred_io callback is where you would perform all your IO to the display device. You receive the pagelist which is the list of pages that were written to during the delay. You must not modify this list. This callback is called from a workqueue. -3. Call init +3. Call init:: + info->fbdefio = &hecubafb_defio; fb_deferred_io_init(info); -4. Call cleanup +4. Call cleanup:: + fb_deferred_io_cleanup(info); diff --git a/Documentation/fb/efifb.txt b/Documentation/fb/efifb.rst similarity index 75% rename from Documentation/fb/efifb.txt rename to Documentation/fb/efifb.rst index 1a85c1bdaf38..04840331a00e 100644 --- a/Documentation/fb/efifb.txt +++ b/Documentation/fb/efifb.rst @@ -1,6 +1,6 @@ - +============== What is efifb? -=============== +============== This is a generic EFI platform driver for Intel based Apple computers. efifb is only for EFI booted Intel Macs. @@ -8,16 +8,17 @@ efifb is only for EFI booted Intel Macs. Supported Hardware ================== -iMac 17"/20" -Macbook -Macbook Pro 15"/17" -MacMini +- iMac 17"/20" +- Macbook +- Macbook Pro 15"/17" +- MacMini How to use it? ============== efifb does not have any kind of autodetection of your machine. -You have to add the following kernel parameters in your elilo.conf: +You have to add the following kernel parameters in your elilo.conf:: + Macbook : video=efifb:macbook MacMini : @@ -29,9 +30,10 @@ You have to add the following kernel parameters in your elilo.conf: Accepted options: +======= =========================================================== nowc Don't map the framebuffer write combined. This can be used to workaround side-effects and slowdowns on other CPU cores when large amounts of console data are written. +======= =========================================================== --- Edgar Hucek diff --git a/Documentation/fb/ep93xx-fb.txt b/Documentation/fb/ep93xx-fb.rst similarity index 85% rename from Documentation/fb/ep93xx-fb.txt rename to Documentation/fb/ep93xx-fb.rst index 5af1bd9effae..6f7767926d1a 100644 --- a/Documentation/fb/ep93xx-fb.txt +++ b/Documentation/fb/ep93xx-fb.rst @@ -4,7 +4,7 @@ Driver for EP93xx LCD controller The EP93xx LCD controller can drive both standard desktop monitors and embedded LCD displays. If you have a standard desktop monitor then you -can use the standard Linux video mode database. In your board file: +can use the standard Linux video mode database. In your board file:: static struct ep93xxfb_mach_info some_board_fb_info = { .num_modes = EP93XXFB_USE_MODEDB, @@ -12,7 +12,7 @@ can use the standard Linux video mode database. In your board file: }; If you have an embedded LCD display then you need to define a video -mode for it as follows: +mode for it as follows:: static struct fb_videomode some_board_video_modes[] = { { @@ -23,11 +23,11 @@ mode for it as follows: Note that the pixel clock value is in pico-seconds. You can use the KHZ2PICOS macro to convert the pixel clock value. Most other values -are in pixel clocks. See Documentation/fb/framebuffer.txt for further +are in pixel clocks. See Documentation/fb/framebuffer.rst for further details. The ep93xxfb_mach_info structure for your board should look like the -following: +following:: static struct ep93xxfb_mach_info some_board_fb_info = { .num_modes = ARRAY_SIZE(some_board_video_modes), @@ -37,7 +37,7 @@ following: }; The framebuffer device can be registered by adding the following to -your board initialisation function: +your board initialisation function:: ep93xx_register_fb(&some_board_fb_info); @@ -50,6 +50,7 @@ to configure the controller. The video attributes flags are fully documented in section 7 of the EP93xx users' guide. The following flags are available: +=============================== ========================================== EP93XXFB_PCLK_FALLING Clock data on the falling edge of the pixel clock. The default is to clock data on the rising edge. @@ -62,10 +63,12 @@ EP93XXFB_SYNC_HORIZ_HIGH Horizontal sync is active high. By EP93XXFB_SYNC_VERT_HIGH Vertical sync is active high. By default the vertical sync is active high. +=============================== ========================================== The physical address of the framebuffer can be controlled using the following flags: +=============================== ====================================== EP93XXFB_USE_SDCSN0 Use SDCSn[0] for the framebuffer. This is the default setting. @@ -74,6 +77,7 @@ EP93XXFB_USE_SDCSN1 Use SDCSn[1] for the framebuffer. EP93XXFB_USE_SDCSN2 Use SDCSn[2] for the framebuffer. EP93XXFB_USE_SDCSN3 Use SDCSn[3] for the framebuffer. +=============================== ====================================== ================== Platform callbacks @@ -87,7 +91,7 @@ blanked or unblanked. The setup and teardown devices pass the platform_device structure as an argument. The fb_info and ep93xxfb_mach_info structures can be -obtained as follows: +obtained as follows:: static int some_board_fb_setup(struct platform_device *pdev) { @@ -101,17 +105,17 @@ obtained as follows: Setting the video mode ====================== -The video mode is set using the following syntax: +The video mode is set using the following syntax:: video=XRESxYRES[-BPP][@REFRESH] If the EP93xx video driver is built-in then the video mode is set on -the Linux kernel command line, for example: +the Linux kernel command line, for example:: video=ep93xx-fb:800x600-16@60 If the EP93xx video driver is built as a module then the video mode is -set when the module is installed: +set when the module is installed:: modprobe ep93xx-fb video=320x240 @@ -121,13 +125,14 @@ Screenpage bug At least on the EP9315 there is a silicon bug which causes bit 27 of the VIDSCRNPAGE (framebuffer physical offset) to be tied low. There is -an unofficial errata for this bug at: +an unofficial errata for this bug at:: + http://marc.info/?l=linux-arm-kernel&m=110061245502000&w=2 By default the EP93xx framebuffer driver checks if the allocated physical address has bit 27 set. If it does, then the memory is freed and an error is returned. The check can be disabled by adding the following -option when loading the driver: +option when loading the driver:: ep93xx-fb.check_screenpage_bug=0 diff --git a/Documentation/fb/fbcon.txt b/Documentation/fb/fbcon.rst similarity index 69% rename from Documentation/fb/fbcon.txt rename to Documentation/fb/fbcon.rst index 60a5ec04e8f0..cfb9f7c38f18 100644 --- a/Documentation/fb/fbcon.txt +++ b/Documentation/fb/fbcon.rst @@ -1,39 +1,41 @@ +======================= The Framebuffer Console ======================= - The framebuffer console (fbcon), as its name implies, is a text +The framebuffer console (fbcon), as its name implies, is a text console running on top of the framebuffer device. It has the functionality of any standard text console driver, such as the VGA console, with the added features that can be attributed to the graphical nature of the framebuffer. - In the x86 architecture, the framebuffer console is optional, and +In the x86 architecture, the framebuffer console is optional, and some even treat it as a toy. For other architectures, it is the only available display device, text or graphical. - What are the features of fbcon? The framebuffer console supports +What are the features of fbcon? The framebuffer console supports high resolutions, varying font types, display rotation, primitive multihead, etc. Theoretically, multi-colored fonts, blending, aliasing, and any feature made available by the underlying graphics card are also possible. A. Configuration +================ - The framebuffer console can be enabled by using your favorite kernel +The framebuffer console can be enabled by using your favorite kernel configuration tool. It is under Device Drivers->Graphics Support->Frame buffer Devices->Console display driver support->Framebuffer Console Support. Select 'y' to compile support statically or 'm' for module support. The module will be fbcon. - In order for fbcon to activate, at least one framebuffer driver is +In order for fbcon to activate, at least one framebuffer driver is required, so choose from any of the numerous drivers available. For x86 systems, they almost universally have VGA cards, so vga16fb and vesafb will always be available. However, using a chipset-specific driver will give you more speed and features, such as the ability to change the video mode dynamically. - To display the penguin logo, choose any logo available in Graphics +To display the penguin logo, choose any logo available in Graphics support->Bootup logo. - Also, you will need to select at least one compiled-in font, but if +Also, you will need to select at least one compiled-in font, but if you don't do anything, the kernel configuration tool will select one for you, usually an 8x16 font. @@ -44,6 +46,7 @@ fortunate to have a driver that does not alter the graphics chip, then you will still get a VGA console. B. Loading +========== Possible scenarios: @@ -72,33 +75,33 @@ Possible scenarios: C. Boot options - The framebuffer console has several, largely unknown, boot options - that can change its behavior. + The framebuffer console has several, largely unknown, boot options + that can change its behavior. 1. fbcon=font: - Select the initial font to use. The value 'name' can be any of the - compiled-in fonts: 10x18, 6x10, 7x14, Acorn8x8, MINI4x6, - PEARL8x8, ProFont6x11, SUN12x22, SUN8x16, VGA8x16, VGA8x8. + Select the initial font to use. The value 'name' can be any of the + compiled-in fonts: 10x18, 6x10, 7x14, Acorn8x8, MINI4x6, + PEARL8x8, ProFont6x11, SUN12x22, SUN8x16, VGA8x16, VGA8x8. Note, not all drivers can handle font with widths not divisible by 8, - such as vga16fb. + such as vga16fb. 2. fbcon=scrollback:[k] - The scrollback buffer is memory that is used to preserve display - contents that has already scrolled past your view. This is accessed - by using the Shift-PageUp key combination. The value 'value' is any - integer. It defaults to 32KB. The 'k' suffix is optional, and will - multiply the 'value' by 1024. + The scrollback buffer is memory that is used to preserve display + contents that has already scrolled past your view. This is accessed + by using the Shift-PageUp key combination. The value 'value' is any + integer. It defaults to 32KB. The 'k' suffix is optional, and will + multiply the 'value' by 1024. 3. fbcon=map:<0123> - This is an interesting option. It tells which driver gets mapped to - which console. The value '0123' is a sequence that gets repeated until - the total length is 64 which is the number of consoles available. In - the above example, it is expanded to 012301230123... and the mapping - will be: + This is an interesting option. It tells which driver gets mapped to + which console. The value '0123' is a sequence that gets repeated until + the total length is 64 which is the number of consoles available. In + the above example, it is expanded to 012301230123... and the mapping + will be:: tty | 1 2 3 4 5 6 7 8 9 ... fb | 0 1 2 3 0 1 2 3 0 ... @@ -126,20 +129,20 @@ C. Boot options 4. fbcon=rotate: - This option changes the orientation angle of the console display. The - value 'n' accepts the following: + This option changes the orientation angle of the console display. The + value 'n' accepts the following: - 0 - normal orientation (0 degree) - 1 - clockwise orientation (90 degrees) - 2 - upside down orientation (180 degrees) - 3 - counterclockwise orientation (270 degrees) + - 0 - normal orientation (0 degree) + - 1 - clockwise orientation (90 degrees) + - 2 - upside down orientation (180 degrees) + - 3 - counterclockwise orientation (270 degrees) The angle can be changed anytime afterwards by 'echoing' the same numbers to any one of the 2 attributes found in /sys/class/graphics/fbcon: - rotate - rotate the display of the active console - rotate_all - rotate the display of all consoles + - rotate - rotate the display of the active console + - rotate_all - rotate the display of all consoles Console rotation will only become available if Framebuffer Console Rotation support is compiled in your kernel. @@ -177,9 +180,9 @@ Before going on to how to attach, detach and unload the framebuffer console, an illustration of the dependencies may help. The console layer, as with most subsystems, needs a driver that interfaces with -the hardware. Thus, in a VGA console: +the hardware. Thus, in a VGA console:: -console ---> VGA driver ---> hardware. + console ---> VGA driver ---> hardware. Assuming the VGA driver can be unloaded, one must first unbind the VGA driver from the console layer before unloading the driver. The VGA driver cannot be @@ -187,9 +190,9 @@ unloaded if it is still bound to the console layer. (See Documentation/console/console.txt for more information). This is more complicated in the case of the framebuffer console (fbcon), -because fbcon is an intermediate layer between the console and the drivers: +because fbcon is an intermediate layer between the console and the drivers:: -console ---> fbcon ---> fbdev drivers ---> hardware + console ---> fbcon ---> fbdev drivers ---> hardware The fbdev drivers cannot be unloaded if bound to fbcon, and fbcon cannot be unloaded if it's bound to the console layer. @@ -204,12 +207,12 @@ So, how do we unbind fbcon from the console? Part of the answer is in Documentation/console/console.txt. To summarize: Echo a value to the bind file that represents the framebuffer console -driver. So assuming vtcon1 represents fbcon, then: +driver. So assuming vtcon1 represents fbcon, then:: -echo 1 > sys/class/vtconsole/vtcon1/bind - attach framebuffer console to - console layer -echo 0 > sys/class/vtconsole/vtcon1/bind - detach framebuffer console from - console layer + echo 1 > sys/class/vtconsole/vtcon1/bind - attach framebuffer console to + console layer + echo 0 > sys/class/vtconsole/vtcon1/bind - detach framebuffer console from + console layer If fbcon is detached from the console layer, your boot console driver (which is usually VGA text mode) will take over. A few drivers (rivafb and i810fb) will @@ -223,19 +226,19 @@ restored properly. The following is one of the several methods that you can do: 2. In your kernel configuration, ensure that CONFIG_FRAMEBUFFER_CONSOLE is set to 'y' or 'm'. Enable one or more of your favorite framebuffer drivers. -3. Boot into text mode and as root run: +3. Boot into text mode and as root run:: vbetool vbestate save > - The above command saves the register contents of your graphics - hardware to . You need to do this step only once as - the state file can be reused. + The above command saves the register contents of your graphics + hardware to . You need to do this step only once as + the state file can be reused. -4. If fbcon is compiled as a module, load fbcon by doing: +4. If fbcon is compiled as a module, load fbcon by doing:: modprobe fbcon -5. Now to detach fbcon: +5. Now to detach fbcon:: vbetool vbestate restore < && \ echo 0 > /sys/class/vtconsole/vtcon1/bind @@ -243,7 +246,7 @@ restored properly. The following is one of the several methods that you can do: 6. That's it, you're back to VGA mode. And if you compiled fbcon as a module, you can unload it by 'rmmod fbcon'. -7. To reattach fbcon: +7. To reattach fbcon:: echo 1 > /sys/class/vtconsole/vtcon1/bind @@ -266,82 +269,82 @@ the following: Variation 1: - a. Before detaching fbcon, do + a. Before detaching fbcon, do:: - vbetool vbemode save > # do once for each vesafb mode, - # the file can be reused + vbetool vbemode save > # do once for each vesafb mode, + # the file can be reused b. Detach fbcon as in step 5. - c. Attach fbcon + c. Attach fbcon:: - vbetool vbestate restore < && \ + vbetool vbestate restore < && \ echo 1 > /sys/class/vtconsole/vtcon1/bind Variation 2: - a. Before detaching fbcon, do: + a. Before detaching fbcon, do:: + echo > /sys/class/tty/console/bind - - vbetool vbemode get + vbetool vbemode get b. Take note of the mode number b. Detach fbcon as in step 5. - c. Attach fbcon: + c. Attach fbcon:: - vbetool vbemode set && \ - echo 1 > /sys/class/vtconsole/vtcon1/bind + vbetool vbemode set && \ + echo 1 > /sys/class/vtconsole/vtcon1/bind Samples: ======== Here are 2 sample bash scripts that you can use to bind or unbind the -framebuffer console driver if you are on an X86 box: +framebuffer console driver if you are on an X86 box:: ---------------------------------------------------------------------------- -#!/bin/bash -# Unbind fbcon + #!/bin/bash + # Unbind fbcon -# Change this to where your actual vgastate file is located -# Or Use VGASTATE=$1 to indicate the state file at runtime -VGASTATE=/tmp/vgastate + # Change this to where your actual vgastate file is located + # Or Use VGASTATE=$1 to indicate the state file at runtime + VGASTATE=/tmp/vgastate -# path to vbetool -VBETOOL=/usr/local/bin + # path to vbetool + VBETOOL=/usr/local/bin -for (( i = 0; i < 16; i++)) -do - if test -x /sys/class/vtconsole/vtcon$i; then - if [ `cat /sys/class/vtconsole/vtcon$i/name | grep -c "frame buffer"` \ - = 1 ]; then + for (( i = 0; i < 16; i++)) + do + if test -x /sys/class/vtconsole/vtcon$i; then + if [ `cat /sys/class/vtconsole/vtcon$i/name | grep -c "frame buffer"` \ + = 1 ]; then if test -x $VBETOOL/vbetool; then echo Unbinding vtcon$i $VBETOOL/vbetool vbestate restore < $VGASTATE echo 0 > /sys/class/vtconsole/vtcon$i/bind fi - fi - fi -done + fi + fi + done --------------------------------------------------------------------------- -#!/bin/bash -# Bind fbcon -for (( i = 0; i < 16; i++)) -do - if test -x /sys/class/vtconsole/vtcon$i; then - if [ `cat /sys/class/vtconsole/vtcon$i/name | grep -c "frame buffer"` \ - = 1 ]; then +:: + + #!/bin/bash + # Bind fbcon + + for (( i = 0; i < 16; i++)) + do + if test -x /sys/class/vtconsole/vtcon$i; then + if [ `cat /sys/class/vtconsole/vtcon$i/name | grep -c "frame buffer"` \ + = 1 ]; then echo Unbinding vtcon$i echo 1 > /sys/class/vtconsole/vtcon$i/bind - fi - fi -done ---------------------------------------------------------------------------- + fi + fi + done --- Antonino Daplas diff --git a/Documentation/fb/framebuffer.txt b/Documentation/fb/framebuffer.rst similarity index 92% rename from Documentation/fb/framebuffer.txt rename to Documentation/fb/framebuffer.rst index 58c5ae2e9f59..7fe087310c82 100644 --- a/Documentation/fb/framebuffer.txt +++ b/Documentation/fb/framebuffer.rst @@ -1,7 +1,7 @@ - The Frame Buffer Device - ----------------------- +======================= +The Frame Buffer Device +======================= -Maintained by Geert Uytterhoeven Last revised: May 10, 2001 @@ -26,7 +26,7 @@ other device in /dev. It's a character device using major 29; the minor specifies the frame buffer number. By convention, the following device nodes are used (numbers indicate the device -minor numbers): +minor numbers):: 0 = /dev/fb0 First frame buffer 1 = /dev/fb1 Second frame buffer @@ -34,15 +34,15 @@ minor numbers): 31 = /dev/fb31 32nd frame buffer For backwards compatibility, you may want to create the following symbolic -links: +links:: /dev/fb0current -> fb0 /dev/fb1current -> fb1 and so on... -The frame buffer devices are also `normal' memory devices, this means, you can -read and write their contents. You can, for example, make a screen snapshot by +The frame buffer devices are also `normal` memory devices, this means, you can +read and write their contents. You can, for example, make a screen snapshot by:: cp /dev/fb0 myfile @@ -54,11 +54,11 @@ Application software that uses the frame buffer device (e.g. the X server) will use /dev/fb0 by default (older software uses /dev/fb0current). You can specify an alternative frame buffer device by setting the environment variable $FRAMEBUFFER to the path name of a frame buffer device, e.g. (for sh/bash -users): +users):: export FRAMEBUFFER=/dev/fb1 -or (for csh users): +or (for csh users):: setenv FRAMEBUFFER /dev/fb1 @@ -90,9 +90,9 @@ which data structures they work. Here's just a brief overview: possible). - You can get and set parts of the color map. Communication is done with 16 - bits per color part (red, green, blue, transparency) to support all - existing hardware. The driver does all the computations needed to apply - it to the hardware (round it down to less bits, maybe throw away + bits per color part (red, green, blue, transparency) to support all + existing hardware. The driver does all the computations needed to apply + it to the hardware (round it down to less bits, maybe throw away transparency). All this hardware abstraction makes the implementation of application programs @@ -113,10 +113,10 @@ much trouble... 3. Frame Buffer Resolution Maintenance -------------------------------------- -Frame buffer resolutions are maintained using the utility `fbset'. It can +Frame buffer resolutions are maintained using the utility `fbset`. It can change the video mode properties of a frame buffer device. Its main usage is -to change the current video mode, e.g. during boot up in one of your /etc/rc.* -or /etc/init.d/* files. +to change the current video mode, e.g. during boot up in one of your `/etc/rc.*` +or `/etc/init.d/*` files. Fbset uses a video mode database stored in a configuration file, so you can easily add your own modes and refer to them with a simple identifier. @@ -129,8 +129,8 @@ The X server (XF68_FBDev) is the most notable application program for the frame buffer device. Starting with XFree86 release 3.2, the X server is part of XFree86 and has 2 modes: - - If the `Display' subsection for the `fbdev' driver in the /etc/XF86Config - file contains a + - If the `Display` subsection for the `fbdev` driver in the /etc/XF86Config + file contains a:: Modes "default" @@ -146,7 +146,7 @@ XFree86 and has 2 modes: same virtual desktop size. The frame buffer device that's used is still /dev/fb0current (or $FRAMEBUFFER), but the available resolutions are defined by /etc/XF86Config now. The disadvantage is that you have to - specify the timings in a different format (but `fbset -x' may help). + specify the timings in a different format (but `fbset -x` may help). To tune a video mode, you can use fbset or xvidtune. Note that xvidtune doesn't work 100% with XF68_FBDev: the reported clock values are always incorrect. @@ -172,29 +172,29 @@ retrace, the electron beam is turned off (blanked). The speed at which the electron beam paints the pixels is determined by the dotclock in the graphics board. For a dotclock of e.g. 28.37516 MHz (millions -of cycles per second), each pixel is 35242 ps (picoseconds) long: +of cycles per second), each pixel is 35242 ps (picoseconds) long:: 1/(28.37516E6 Hz) = 35.242E-9 s -If the screen resolution is 640x480, it will take +If the screen resolution is 640x480, it will take:: 640*35.242E-9 s = 22.555E-6 s to paint the 640 (xres) pixels on one scanline. But the horizontal retrace -also takes time (e.g. 272 `pixels'), so a full scanline takes +also takes time (e.g. 272 `pixels`), so a full scanline takes:: (640+272)*35.242E-9 s = 32.141E-6 s -We'll say that the horizontal scanrate is about 31 kHz: +We'll say that the horizontal scanrate is about 31 kHz:: 1/(32.141E-6 s) = 31.113E3 Hz A full screen counts 480 (yres) lines, but we have to consider the vertical -retrace too (e.g. 49 `lines'). So a full screen will take +retrace too (e.g. 49 `lines`). So a full screen will take:: (480+49)*32.141E-6 s = 17.002E-3 s -The vertical scanrate is about 59 Hz: +The vertical scanrate is about 59 Hz:: 1/(17.002E-3 s) = 58.815 Hz @@ -212,7 +212,7 @@ influenced by the moments at which the synchronization pulses occur. The following picture summarizes all timings. The horizontal retrace time is the sum of the left margin, the right margin and the hsync length, while the vertical retrace time is the sum of the upper margin, the lower margin and the -vsync length. +vsync length:: +----------+---------------------------------------------+----------+-------+ | | ↑ | | | @@ -256,7 +256,8 @@ The frame buffer device expects all horizontal timings in number of dotclocks 6. Converting XFree86 timing values info frame buffer device timings -------------------------------------------------------------------- -An XFree86 mode line consists of the following fields: +An XFree86 mode line consists of the following fields:: + "800x600" 50 800 856 976 1040 600 637 643 666 < name > DCF HR SH1 SH2 HFL VR SV1 SV2 VFL @@ -271,19 +272,27 @@ The frame buffer device uses the following fields: - vsync_len: length of vertical sync 1) Pixelclock: + xfree: in MHz + fb: in picoseconds (ps) pixclock = 1000000 / DCF 2) horizontal timings: + left_margin = HFL - SH2 + right_margin = SH1 - HR + hsync_len = SH2 - SH1 3) vertical timings: + upper_margin = VFL - SV2 + lower_margin = SV1 - VR + vsync_len = SV2 - SV1 Good examples for VESA timings can be found in the XFree86 source tree, @@ -303,9 +312,10 @@ and to the following documentation: - The manual pages for fbset: fbset(8), fb.modes(5) - The manual pages for XFree86: XF68_FBDev(1), XF86Config(4/5) - The mighty kernel sources: - o linux/drivers/video/ - o linux/include/linux/fb.h - o linux/include/video/ + + - linux/drivers/video/ + - linux/include/linux/fb.h + - linux/include/video/ @@ -330,14 +340,14 @@ and on its mirrors. The latest version of fbset can be found at - http://www.linux-fbdev.org/ + http://www.linux-fbdev.org/ + + +10. Credits +----------- - -10. Credits ----------- - This readme was written by Geert Uytterhoeven, partly based on the original -`X-framebuffer.README' by Roman Hodek and Martin Schaller. Section 6 was +`X-framebuffer.README` by Roman Hodek and Martin Schaller. Section 6 was provided by Frank Neumann. The frame buffer device abstraction was designed by Martin Schaller. diff --git a/Documentation/fb/gxfb.txt b/Documentation/fb/gxfb.rst similarity index 60% rename from Documentation/fb/gxfb.txt rename to Documentation/fb/gxfb.rst index 2f640903bbb2..5738709bccbb 100644 --- a/Documentation/fb/gxfb.txt +++ b/Documentation/fb/gxfb.rst @@ -1,7 +1,8 @@ -[This file is cloned from VesaFB/aty128fb] - +============= What is gxfb? -================= +============= + +.. [This file is cloned from VesaFB/aty128fb] This is a graphics framebuffer driver for AMD Geode GX2 based processors. @@ -23,9 +24,9 @@ How to use it? ============== Switching modes is done using gxfb.mode_option=... boot -parameter or using `fbset' program. +parameter or using `fbset` program. -See Documentation/fb/modedb.txt for more information on modedb +See Documentation/fb/modedb.rst for more information on modedb resolutions. @@ -42,11 +43,12 @@ You can pass kernel command line options to gxfb with gxfb.