1
0
Fork 0
Commit Graph

68 Commits (redonkable)

Author SHA1 Message Date
Dmitry Monakhov b28df8395d quota: Check that quota is not dirty before release
commit df4bb5d128 upstream.

There is a race window where quota was redirted once we drop dq_list_lock inside dqput(),
but before we grab dquot->dq_lock inside dquot_release()

TASK1                                                       TASK2 (chowner)
->dqput()
  we_slept:
    spin_lock(&dq_list_lock)
    if (dquot_dirty(dquot)) {
          spin_unlock(&dq_list_lock);
          dquot->dq_sb->dq_op->write_dquot(dquot);
          goto we_slept
    if (test_bit(DQ_ACTIVE_B, &dquot->dq_flags)) {
          spin_unlock(&dq_list_lock);
          dquot->dq_sb->dq_op->release_dquot(dquot);
                                                            dqget()
							    mark_dquot_dirty()
							    dqput()
          goto we_slept;
        }
So dquot dirty quota will be released by TASK1, but on next we_sleept loop
we detect this and call ->write_dquot() for it.
XFSTEST: 440a80d4cb

Link: https://lore.kernel.org/r/20191031103920.3919-2-dmonakhov@openvz.org
CC: stable@vger.kernel.org
Signed-off-by: Dmitry Monakhov <dmtrmonakhov@yandex-team.ru>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-17 19:56:43 +01:00
Jeff Layton cc56c33e78 ocfs2: convert to new i_version API
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
2018-01-29 06:42:21 -05:00
Greg Kroah-Hartman b24413180f License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier.  The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
 - file had no licensing information it it.
 - file was a */uapi/* one with no licensing information in it,
 - file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne.  Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed.  Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
 - Files considered eligible had to be source code files.
 - Make and config files were included as candidates if they contained >5
   lines of source
 - File already had some variant of a license header in it (even if <5
   lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

 - when both scanners couldn't find any license traces, file was
   considered to have no license information in it, and the top level
   COPYING file license applied.

   For non */uapi/* files that summary was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0                                              11139

   and resulted in the first patch in this series.

   If that file was a */uapi/* path one, it was "GPL-2.0 WITH
   Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0 WITH Linux-syscall-note                        930

   and resulted in the second patch in this series.

 - if a file had some form of licensing information in it, and was one
   of the */uapi/* ones, it was denoted with the Linux-syscall-note if
   any GPL family license was found in the file or had no licensing in
   it (per prior point).  Results summary:

   SPDX license identifier                            # files
   ---------------------------------------------------|------
   GPL-2.0 WITH Linux-syscall-note                       270
   GPL-2.0+ WITH Linux-syscall-note                      169
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
   LGPL-2.1+ WITH Linux-syscall-note                      15
   GPL-1.0+ WITH Linux-syscall-note                       14
   ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
   LGPL-2.0+ WITH Linux-syscall-note                       4
   LGPL-2.1 WITH Linux-syscall-note                        3
   ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
   ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1

   and that resulted in the third patch in this series.

 - when the two scanners agreed on the detected license(s), that became
   the concluded license(s).

 - when there was disagreement between the two scanners (one detected a
   license but the other didn't, or they both detected different
   licenses) a manual inspection of the file occurred.

 - In most cases a manual inspection of the information in the file
   resulted in a clear resolution of the license that should apply (and
   which scanner probably needed to revisit its heuristics).

 - When it was not immediately clear, the license identifier was
   confirmed with lawyers working with the Linux Foundation.

 - If there was any question as to the appropriate license identifier,
   the file was flagged for further research and to be revisited later
   in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights.  The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
 - a full scancode scan run, collecting the matched texts, detected
   license ids and scores
 - reviewing anything where there was a license detected (about 500+
   files) to ensure that the applied SPDX license was correct
 - reviewing anything where there was no detection but the patch license
   was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
   SPDX license was correct

This produced a worksheet with 20 files needing minor correction.  This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg.  Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected.  This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.)  Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-02 11:10:55 +01:00
Jan Kara 7b9ca4c61b quota: Reduce contention on dq_data_lock
dq_data_lock is currently used to protect all modifications of quota
accounting information, consistency of quota accounting on the inode,
and dquot pointers from inode. As a result contention on the lock can be
pretty heavy.

Reduce the contention on the lock by protecting quota accounting
information by a new dquot->dq_dqb_lock and consistency of quota
accounting with inode usage by inode->i_lock.

This change reduces time to create 500000 files on ext4 on ramdisk by 50
different processes in separate directories by 6% when user quota is
turned on. When those 50 processes belong to 50 different users, the
improvement is about 9%.

Signed-off-by: Jan Kara <jack@suse.cz>
2017-08-17 22:07:59 +02:00
Jan Kara 9a8ae30e73 quota: Push dqio_sem down to ->write_file_info()
Push down acquisition of dqio_sem into ->write_file_info() callback.
Mostly for consistency with other operations.

Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Jan Kara <jack@suse.cz>
2017-08-17 19:11:23 +02:00
Jan Kara bc8230ee8e quota: Convert dqio_mutex to rwsem
Convert dqio_mutex to rwsem and call it dqio_sem. No functional changes
yet.

Signed-off-by: Jan Kara <jack@suse.cz>
2017-08-17 18:52:48 +02:00
Jan Kara 2a64f80ee3 ocfs2: Protect periodic quota syncing with s_umount semaphore
New quota locking rules will require s_umount semaphore for all quota
scanning functions. Add is for periodic quota syncing.

Tested-by: Eric Ren <zren@suse.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2016-11-30 08:36:54 +01:00
Arnd Bergmann e008bb6134 quota: use time64_t internally
The quota subsystem has two formats, the old v1 format using architecture
specific time_t values on the on-disk format, while the v2 format
(introduced in Linux 2.5.16 and 2.4.22) uses fixed 64-bit little-endian.

While there is no future for the v1 format beyond y2038, the v2 format
is almost there on 32-bit architectures, as both the user interface
and the on-disk format use 64-bit timestamps, just not the time_t
inbetween.

This changes the internal representation to use time64_t, which will
end up doing the right thing everywhere for v2 format.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2016-06-19 18:09:31 +02:00
Jan Kara 8f9e8f5fcc ocfs2: Fix Q_GETNEXTQUOTA for filesystem without quotas
When Q_GETNEXTQUOTA was called for filesystem with quotas disabled
ocfs2_get_next_id() oopses. Fix the problem by checking early whether
the filesystem has quotas enabled.

Signed-off-by: Jan Kara <jack@suse.cz>
2016-03-29 17:44:11 +02:00
jiangyiwen 35ddf78e41 ocfs2: fix occurring deadlock by changing ocfs2_wq from global to local
This patch fixes a deadlock, as follows:

  Node 1                Node 2                  Node 3
1)volume a and b are    only mount vol a        only mount vol b
  mounted

2)                      start to mount b        start to mount a

3)                      check hb of Node 3      check hb of Node 2
                        in vol a, qs_holds++    in vol b, qs_holds++

4) -------------------- all nodes' network down --------------------

5)                      progress of mount b     the same situation as
                        failed, and then call   Node 2
                        ocfs2_dismount_volume.
                        but the process is hung,
                        since there is a work
                        in ocfs2_wq cannot beo
                        completed. This work is
                        about vol a, because
                        ocfs2_wq is global wq.
                        BTW, this work which is
                        scheduled in ocfs2_wq is
                        ocfs2_orphan_scan_work,
                        and the context in this work
                        needs to take inode lock
                        of orphan_dir, because
                        lockres owner are Node 1 and
                        all nodes' nework has been down
                        at the same time, so it can't
                        get the inode lock.

6)                      Why can't this node be fenced
                        when network disconnected?
                        Because the process of
                        mount is hung what caused qs_holds
                        is not equal 0.

Because all works in the ocfs2_wq are relative to the super block.

The solution is to change the ocfs2_wq from global to local.  In other
words, move it into struct ocfs2_super.

Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Xue jiufei <xuejiufei@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-25 16:37:42 -07:00
Jan Kara 5a9530e498 ocfs2: Implement get_next_id()
Implement get_next_id() callback to enable use of Q_GETNEXTQUOTA
quotactl for OCFS2.

Signed-off-by: Jan Kara <jack@suse.cz>
2016-02-09 13:05:23 +01:00
Al Viro 5955102c99 wrappers for ->i_mutex access
parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).

Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-22 18:04:28 -05:00
Julia Lawall d1b98c23f7 quota: constify qtree_fmt_operations structures
The qtree_fmt_operations structures are never modified, so declare them as
const.

Done with the help of Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Jan Kara <jack@suse.cz>
2016-01-04 10:58:35 +01:00
Jan Kara 52362810be ocfs2: Don't use MAXQUOTAS value
MAXQUOTAS value defines maximum number of quota types VFS supports.
This isn't necessarily the number of types ocfs2 supports and with
addition of project quotas these two numbers stop matching. So make
ocfs2 use its private definition.

CC: Mark Fasheh <mfasheh@suse.com>
CC: Joel Becker <jlbec@evilplan.org>
CC: ocfs2-devel@oss.oracle.com
Signed-off-by: Jan Kara <jack@suse.cz>
2014-09-17 11:59:12 +02:00
Jan Kara e3a767b60f ocfs2: implement delayed dropping of last dquot reference
We cannot drop last dquot reference from downconvert thread as that
creates the following deadlock:

NODE 1                                  NODE2
holds dentry lock for 'foo'
holds inode lock for GLOBAL_BITMAP_SYSTEM_INODE
                                        dquot_initialize(bar)
                                          ocfs2_dquot_acquire()
                                            ocfs2_inode_lock(USER_QUOTA_SYSTEM_INODE)
                                            ...
downconvert thread (triggered from another
node or a different process from NODE2)
  ocfs2_dentry_post_unlock()
    ...
    iput(foo)
      ocfs2_evict_inode(foo)
        ocfs2_clear_inode(foo)
          dquot_drop(inode)
            ...
	    ocfs2_dquot_release()
              ocfs2_inode_lock(USER_QUOTA_SYSTEM_INODE)
               - blocks
                                            finds we need more space in
                                            quota file
                                            ...
                                            ocfs2_extend_no_holes()
                                              ocfs2_inode_lock(GLOBAL_BITMAP_SYSTEM_INODE)
                                                - deadlocks waiting for
                                                  downconvert thread

We solve the problem by postponing dropping of the last dquot reference to
a workqueue if it happens from the downconvert thread.

Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:20:54 -07:00
Jan Kara 15c34a7606 ocfs2: fix quota file corruption
Global quota files are accessed from different nodes.  Thus we cannot
cache offset of quota structure in the quota file after we drop our node
reference count to it because after that moment quota structure may be
freed and reallocated elsewhere by a different node resulting in
corruption of quota file.

Fix the problem by clearing dq_off when we are releasing dquot structure.
We also remove the DB_READ_B handling because it is useless -
DQ_ACTIVE_B is set iff DQ_READ_B is set.

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-03-04 07:55:48 -08:00
Junxiao Bi f17c20dd2e ocfs2: use i_size_read() to access i_size
Though ocfs2 uses inode->i_mutex to protect i_size, there are both
i_size_read/write() and direct accesses.  Clean up all direct access to
eliminate confusion.

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Jie Liu <jeff.liu@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:56:30 -07:00
Eric W. Biederman 4c376dcae8 userns: Convert struct dquot dq_id to be a struct kqid
Change struct dquot dq_id to a struct kqid and remove the now
unecessary dq_type.

Make minimal changes to dquot, quota_tree, quota_v1, quota_v2, ext3,
ext4, and ocfs2 to deal with the change in quota structures and
signatures.  The ocfs2 changes are larger than most because of the
extensive tracing throughout the ocfs2 quota code that prints out
dq_id.

quota_tree.c:get_index is modified to take a struct kqid instead of a
qid_t because all of it's callers pass in dquot->dq_id and it allows
me to introduce only a single conversion.

The rest of the changes are either just replacing dq_type with dq_id.type,
adding conversions to deal with the change in type and occassionally
adding qid_eq to allow quota id comparisons in a user namespace safe way.

Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Theodore Tso <tytso@mit.edu>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-09-18 01:01:41 -07:00
Jan Kara a4564ead76 ocfs2: Fix bogus error message from ocfs2_global_read_info
'status' variable in ocfs2_global_read_info() is always != 0 when leaving the
function because it happens to contain number of read bytes. Thus we always log
error message although everything is OK. Since all error cases properly call
mlog_errno() before jumping to out_err, there's no reason to call mlog_errno()
on exit at all. This is a fallout of c1e8d35e (conversion of mlog_exit()
calls).

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Joel Becker <jlbec@evilplan.org>
2012-07-03 23:27:17 -07:00
Lucas De Marchi 25985edced Fix common misspellings
Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-31 11:26:23 -03:00
Linus Torvalds 03e4970c10 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: (39 commits)
  Treat writes as new when holes span across page boundaries
  fs,ocfs2: Move o2net_get_func_run_time under CONFIG_OCFS2_FS_STATS.
  ocfs2/dlm: Move kmalloc() outside the spinlock
  ocfs2: Make the left masklogs compat.
  ocfs2: Remove masklog ML_AIO.
  ocfs2: Remove masklog ML_UPTODATE.
  ocfs2: Remove masklog ML_BH_IO.
  ocfs2: Remove masklog ML_JOURNAL.
  ocfs2: Remove masklog ML_EXPORT.
  ocfs2: Remove masklog ML_DCACHE.
  ocfs2: Remove masklog ML_NAMEI.
  ocfs2: Remove mlog(0) from fs/ocfs2/dir.c
  ocfs2: remove NAMEI from symlink.c
  ocfs2: Remove masklog ML_QUOTA.
  ocfs2: Remove mlog(0) from quota_local.c.
  ocfs2: Remove masklog ML_RESERVATIONS.
  ocfs2: Remove masklog ML_XATTR.
  ocfs2: Remove masklog ML_SUPER.
  ocfs2: Remove mlog(0) from fs/ocfs2/heartbeat.c
  ocfs2: Remove mlog(0) from fs/ocfs2/slot_map.c
  ...

Fix up trivial conflict in fs/ocfs2/super.c
2011-03-28 13:03:31 -07:00
Tao Ma 1db986a839 ocfs2: Remove masklog ML_QUOTA.
Remove mlog(0) from fs/ocfs2/quota_global.c and
the masklog QUOTA.

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
2011-02-23 22:19:12 +08:00
Tao Ma c1e8d35ef5 ocfs2: Remove EXIT from masklog.
mlog_exit is used to record the exit status of a function.
But because it is added in so many functions, if we enable it,
the system logs get filled up quickly and cause too much I/O.
So actually no one can open it for a production system or even
for a test.

This patch just try to remove it or change it. So:
1. if all the error paths already use mlog_errno, it is just removed.
   Otherwise, it will be replaced by mlog_errno.
2. if it is used to print some return value, it is replaced with
   mlog(0,...).
mlog_exit_ptr is changed to mlog(0.
All those mlog(0,...) will be replaced with trace events later.

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
2011-03-07 16:43:21 +08:00
Tao Ma ef6b689b63 ocfs2: Remove ENTRY from masklog.
ENTRY is used to record the entry of a function.
But because it is added in so many functions, if we enable it,
the system logs get filled up quickly and cause too much I/O.
So actually no one can open it for a production system or even
for a test.

So for mlog_entry_void, we just remove it.
for mlog_entry(...), we replace it with mlog(0,...), and they
will be replace by trace event later.

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
2011-02-21 11:10:44 +08:00
Tejun Heo 316873c958 ocfs2: use system_wq instead of ocfs2_quota_wq
ocfs2_quota_wq is not depended upon during memory reclaim and, with
cmwq, there's no reason to use a dedicated workqueue.  Drop
ocfs2_quota_wq and use system_wq instead.  dqi_sync_work is already
sync canceled on quota disable and no further synchronization is
necessary.

This change makes ocfs2_quota_setup/shutdown() noops.  Both functions
removed.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <joel.becker@oracle.com>
2011-02-01 11:42:42 +01:00
Joel Becker 5693486bad ocfs2: Zero the tail cluster when extending past i_size.
ocfs2's allocation unit is the cluster.  This can be larger than a block
or even a memory page.  This means that a file may have many blocks in
its last extent that are beyond the block containing i_size.  There also
may be more unwritten extents after that.

When ocfs2 grows a file, it zeros the entire cluster in order to ensure
future i_size growth will see cleared blocks.  Unfortunately,
block_write_full_page() drops the pages past i_size.  This means that
ocfs2 is actually leaking garbage data into the tail end of that last
cluster.  This is a bug.

We adjust ocfs2_write_begin_nolock() and ocfs2_extend_file() to detect
when a write or truncate is past i_size.  They will use
ocfs2_zero_extend() to ensure the data is properly zeroed.

Older versions of ocfs2_zero_extend() simply zeroed every block between
i_size and the zeroing position.  This presumes three things:

1) There is allocation for all of these blocks.
2) The extents are not unwritten.
3) The extents are not refcounted.

(1) and (2) hold true for non-sparse filesystems, which used to be the
only users of ocfs2_zero_extend().  (3) is another bug.

Since we're now using ocfs2_zero_extend() for sparse filesystems as
well, we teach ocfs2_zero_extend() to check every extent between
i_size and the zeroing position.  If the extent is unwritten, it is
ignored.  If it is refcounted, it is CoWed.  Then it is zeroed.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Cc: stable@kernel.org
2010-07-08 13:25:35 -07:00
Jan Kara 741e128933 ocfs2: Fix NULL pointer deref when writing local dquot
commit_dqblk() can write quota info to global file. That is actually a bad
thing to do because if we are just modifying local quota file, we are not
prepared (do not hold proper locks, do not have transaction credits) to do
a modification of the global quota file. So do not use commit_dqblk() and
instead call our writing function directly.

Acked-by: Joel Becker <Joel.Becker@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2010-05-21 19:30:48 +02:00
Jan Kara 832d09cf14 ocfs2: Fix estimate of credits needed for quota allocation
We were missing reservation of a journal credit for modification of quota
file inode when creating new dquot structure in the global quota file.

Acked-by: Joel Becker <Joel.Becker@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2010-05-21 19:30:47 +02:00
Jan Kara fb8dd8d780 ocfs2: Fix quota locking
OCFS2 had three issues with quota locking:
a) When reading dquot from global quota file, we started a transaction while
   holding dqio_mutex which is prone to deadlocks because other paths do it
   the other way around
b) During ocfs2_sync_dquot we were not protected against concurrent writers
   on the same node. Because we first copy data to local buffer, a race
   could happen resulting in old data being written to global quota file and
   thus causing quota inconsistency after a crash.
c) ip_alloc_sem of quota files was acquired while a transaction is started
   in ocfs2_quota_write which can deadlock because we first get ip_alloc_sem
   and then start a transaction when extending quota files.

We fix the problem a) by pulling all necessary code to ocfs2_acquire_dquot
and ocfs2_release_dquot. Thus we no longer depend on generic dquot_acquire
to do the locking and can force proper lock ordering.

Problems b) and c) are fixed by locking i_mutex and ip_alloc_sem of
global quota file in ocfs2_lock_global_qf and removing ip_alloc_sem from
ocfs2_quota_read and ocfs2_quota_write.

Acked-by: Joel Becker <Joel.Becker@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2010-05-21 19:30:47 +02:00
Jan Kara ae4f6ef134 ocfs2: Avoid unnecessary block mapping when refreshing quota info
The position of global quota file info does not change. So we do not have
to do logical -> physical block translation every time we reread it from
disk. Thus we can also avoid taking ip_alloc_sem.

Acked-by: Joel Becker <Joel.Becker@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2010-05-21 19:30:46 +02:00
Jan Kara f64dd44eb7 ocfs2: Do not map blocks from local quota file on each write
There is no need to map offset of local dquot structure to on disk block
in each quota write. It is enough to map it just once and store the physical
block number in quota structure in memory. Moreover this simplifies locking
as we do not have to take ip_alloc_sem from quota write path.

Acked-by: Joel Becker <Joel.Becker@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2010-05-21 19:30:46 +02:00
Linus Torvalds 03e62303cf Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: (47 commits)
  ocfs2: Silence a gcc warning.
  ocfs2: Don't retry xattr set in case value extension fails.
  ocfs2:dlm: avoid dlm->ast_lock lockres->spinlock dependency break
  ocfs2: Reset xattr value size after xa_cleanup_value_truncate().
  fs/ocfs2/dlm: Use kstrdup
  fs/ocfs2/dlm: Drop memory allocation cast
  Ocfs2: Optimize punching-hole code.
  Ocfs2: Make ocfs2_find_cpos_for_left_leaf() public.
  Ocfs2: Fix hole punching to correctly do CoW during cluster zeroing.
  Ocfs2: Optimize ocfs2 truncate to use ocfs2_remove_btree_range() instead.
  ocfs2: Block signals for mkdir/link/symlink/O_CREAT.
  ocfs2: Wrap signal blocking in void functions.
  ocfs2/dlm: Increase o2dlm lockres hash size
  ocfs2: Make ocfs2_extend_trans() really extend.
  ocfs2/trivial: Code cleanup for allocation reservation.
  ocfs2: make ocfs2_adjust_resv_from_alloc simple.
  ocfs2: Make nointr a default mount option
  ocfs2/dlm: Make o2dlm domain join/leave messages KERN_NOTICE
  o2net: log socket state changes
  ocfs2: print node # when tcp fails
  ...
2010-05-21 07:20:17 -07:00
Joel Becker ec20cec7a3 ocfs2: Make ocfs2_journal_dirty() void.
jbd[2]_journal_dirty_metadata() only returns 0.  It's been returning 0
since before the kernel moved to git.  There is no point in checking
this error.

ocfs2_journal_dirty() has been faithfully returning the status since the
beginning.  All over ocfs2, we have blocks of code checking this can't
fail status.  In the past few years, we've tried to avoid adding these
checks, because they are pointless.  But anyone who looks at our code
assumes they are needed.

Finally, ocfs2_journal_dirty() is made a void function.  All error
checking is removed from other files.  We'll BUG_ON() the status of
jbd2_journal_dirty_metadata() just in case they change it someday.  They
won't.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-05-05 18:17:29 -07:00
Tejun Heo 5a0e3ad6af include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-30 22:02:32 +09:00
Christoph Hellwig 871a293155 dquot: cleanup dquot initialize routine
Get rid of the initialize dquot operation - it is now always called from
the filesystem and if a filesystem really needs it's own (which none
currently does) it can just call into it's own routine directly.

Rename the now static low-level dquot_initialize helper to __dquot_initialize
and vfs_dq_init to dquot_initialize to have a consistent namespace.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2010-03-05 00:20:30 +01:00
Christoph Hellwig 9f75475802 dquot: cleanup dquot drop routine
Get rid of the drop dquot operation - it is now always called from
the filesystem and if a filesystem really needs it's own (which none
currently does) it can just call into it's own routine directly.

Rename the now static low-level dquot_drop helper to __dquot_drop
and vfs_dq_drop to dquot_drop to have a consistent namespace.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2010-03-05 00:20:30 +01:00
Christoph Hellwig b43fa8284d dquot: cleanup dquot transfer routine
Get rid of the transfer dquot operation - it is now always called from
the filesystem and if a filesystem really needs it's own (which none
currently does) it can just call into it's own routine directly.

Rename the now static low-level dquot_transfer helper to __dquot_transfer
and vfs_dq_transfer to dquot_transfer to have a consistent namespace,
and make the new dquot_transfer return a normal negative errno value
which all callers expect.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2010-03-05 00:20:29 +01:00
Christoph Hellwig 63936ddaa1 dquot: cleanup inode allocation / freeing routines
Get rid of the alloc_inode and free_inode dquot operations - they are
always called from the filesystem and if a filesystem really needs
their own (which none currently does) it can just call into it's
own routine directly.

Also get rid of the vfs_dq_alloc/vfs_dq_free wrappers and always
call the lowlevel dquot_alloc_inode / dqout_free_inode routines
directly, which now lose the number argument which is always 1.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2010-03-05 00:20:28 +01:00
Christoph Hellwig 5dd4056db8 dquot: cleanup space allocation / freeing routines
Get rid of the alloc_space, free_space, reserve_space, claim_space and
release_rsv dquot operations - they are always called from the filesystem
and if a filesystem really needs their own (which none currently does)
it can just call into it's own routine directly.

Move shared logic into the common __dquot_alloc_space,
dquot_claim_space_nodirty and __dquot_free_space low-level methods,
and rationalize the wrappers around it to move as much as possible
code into the common block for CONFIG_QUOTA vs not.  Also rename
all these helpers to be named dquot_* instead of vfs_dq_*.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2010-03-05 00:20:28 +01:00
Linus Torvalds b64ada6b23 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: (85 commits)
  ocfs2: Use buffer IO if we are appending a file.
  ocfs2: add spinlock protection when dealing with lockres->purge.
  dlmglue.c: add missed mlog lines
  ocfs2: __ocfs2_abort() should not enable panic for local mounts
  ocfs2: Add ioctl for reflink.
  ocfs2: Enable refcount tree support.
  ocfs2: Implement ocfs2_reflink.
  ocfs2: Add preserve to reflink.
  ocfs2: Create reflinked file in orphan dir.
  ocfs2: Use proper parameter for some inode operation.
  ocfs2: Make transaction extend more efficient.
  ocfs2: Don't merge in 1st refcount ops of reflink.
  ocfs2: Modify removing xattr process for refcount.
  ocfs2: Add reflink support for xattr.
  ocfs2: Create an xattr indexed block if needed.
  ocfs2: Call refcount tree remove process properly.
  ocfs2: Attach xattr clusters to refcount tree.
  ocfs2: Abstract ocfs2 xattr tree extend rec iteration process.
  ocfs2: Abstract the creation of xattr block.
  ocfs2: Remove inode from ocfs2_xattr_bucket_get_name_value.
  ...
2009-09-23 09:29:20 -07:00
Linus Torvalds 342ff1a1b5 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (34 commits)
  trivial: fix typo in aic7xxx comment
  trivial: fix comment typo in drivers/ata/pata_hpt37x.c
  trivial: typo in kernel-parameters.txt
  trivial: fix typo in tracing documentation
  trivial: add __init/__exit macros in drivers/gpio/bt8xxgpio.c
  trivial: add __init macro/ fix of __exit macro location in ipmi_poweroff.c
  trivial: remove unnecessary semicolons
  trivial: Fix duplicated word "options" in comment
  trivial: kbuild: remove extraneous blank line after declaration of usage()
  trivial: improve help text for mm debug config options
  trivial: doc: hpfall: accept disk device to unload as argument
  trivial: doc: hpfall: reduce risk that hpfall can do harm
  trivial: SubmittingPatches: Fix reference to renumbered step
  trivial: fix typos "man[ae]g?ment" -> "management"
  trivial: media/video/cx88: add __init/__exit macros to cx88 drivers
  trivial: fix typo in CONFIG_DEBUG_FS in gcov doc
  trivial: fix missing printk space in amd_k7_smp_check
  trivial: fix typo s/ketymap/keymap/ in comment
  trivial: fix typo "to to" in multiple files
  trivial: fix typos in comments s/DGBU/DBGU/
  ...
2009-09-22 07:51:45 -07:00
Alexey Dobriyan 61e225dc34 const: make struct super_block::dq_op const
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-22 07:17:24 -07:00
Joe Perches a419aef8b8 trivial: remove unnecessary semicolons
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-09-21 15:14:58 +02:00
Joel Becker 0cf2f7632b ocfs2: Pass struct ocfs2_caching_info to the journal functions.
The next step in divorcing metadata I/O management from struct inode is
to pass struct ocfs2_caching_info to the journal functions.  Thus the
journal locks a metadata cache with the cache io_lock function.  It also
can compare ci_last_trans and ci_created_trans directly.

This is a large patch because of all the places we change
ocfs2_journal_access..(handle, inode, ...) to
ocfs2_journal_access..(handle, INODE_CACHE(inode), ...).

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:07:50 -07:00
Joel Becker 8cb471e8f8 ocfs2: Take the inode out of the metadata read/write paths.
We are really passing the inode into the ocfs2_read/write_blocks()
functions to get at the metadata cache.  This commit passes the cache
directly into the metadata block functions, divorcing them from the
inode.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 16:07:48 -07:00
Jan Kara ada508274b ocfs2: Handle quota file corruption more gracefully
ocfs2_read_virt_blocks() does BUG when we try to read a block from a file
beyond its end. Since this can happen due to filesystem corruption, it
is not really an appropriate answer. Make ocfs2_read_quota_block() check
the condition and handle it by calling ocfs2_error() and returning EIO.

[ Modified to print ip_blkno in the error - Joel ]

Reported-by: Tristan Ye <tristan.ye@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-08-17 12:50:12 -07:00
Jan Kara b409d7a0ab ocfs2: Fix possible deadlock when extending quota file
In OCFS2, allocator locks rank above transaction start. Thus we
cannot extend quota file from inside a transaction less we could
deadlock.

We solve the problem by starting transaction not already in
ocfs2_acquire_dquot() but only in ocfs2_local_read_dquot() and
ocfs2_global_read_dquot() and we allocate blocks to quota files before starting
the transaction.  In case we crash, quota files will just have a few blocks
more but that's no problem since we just use them next time we extend the
quota file.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-08-10 12:20:22 -07:00
Tao Ma e9956fae7d ocfs2/quota: Release lock for error in ocfs2_quota_write.
ocfs2_quota_write needs to release the lock if it fails to
read quota block. So use "goto out" instead of "return err".

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-07-30 11:06:06 -07:00
Jan Kara 0584974a77 ocfs2: Define credit counts for quota operations
Numbers of needed credits for some quota operations were written
as raw numbers. Create appropriate defines instead.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-07-23 10:59:31 -07:00
Jan Kara 4539f1df25 ocfs2: Remove syncjiff field from quota info
syncjiff is just a converted value of syncms. Some places which
are updating syncms forgot to update syncjiff as well. Since the
conversion is just a simple division / multiplication and it does
not happen frequently, just remove the syncjiff field to avoid
forgotten conversions.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-07-23 10:59:27 -07:00